Quantcast

Don’t Index This Page

Official Google Blog: Controlling how search engines access and index your website

Controlling how search engines access and index your website

1/26/2007 11:36:00 AM
Posted by Dan Crow, Product Manager

I’m often asked about how Google and search engines work. One key question is: how does Google know what parts of a website the site owner wants to have show up in search results? Can publishers specify that some parts of the site should be private and non-searchable? The good news is that those who publish on the web have a lot of control over which pages should appear in search results.

The key is a simple file called robots.txt that has been an industry standard for many years. It lets a site owner control how search engines access their web site. With robots.txt you can control access at multiple levels — the entire site, through individual directories, pages of a specific type, down to individual pages. Effective use of robots.txt gives you a lot of control over how your site is searched, but its not always obvious how to achieve exactly what you want. This is the first of a series of posts on how to use robots.txt to control access to your content.

It’s interesting that this is still such a big problem that it deserves a place in the Google Blog. The robots.txt file has been around for web aeons. Yet apparently people are still complaining that Google indexes too much of their stuff.

Technorati Tags: ,

Buzz Me  

Related posts:

  1. 9 Words I’d Cut from the Google Home Page
  2. Adventures in Money Making Gets Nimble
  3. Let’s do the logical thing with on-line book searching? don’t think so
  4. Moderately entertaining, but what about index funds?
  5. Google, Amazon working on $0.05/page on-line reading

No comments yet to Don’t Index This Page

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>