robots.txt: user-agent: Googlebot disallow: / Google still indexing

Anders picture Anders · Jan 22, 2011 · Viewed 10.7k times · Source

Look at the robots.txt of this site:

fr2.dk/robots.txt

The content is:

User-Agent: Googlebot
Disallow: /

That ought to tell google not to index the site, no?

If true, why does the site appear in google searches?

Answer

earl picture earl · Jan 22, 2011

Besides having to wait, because Google's index updates take some time, also note that if you have other sites linking to your site, robots.txt alone won't be sufficient to remove your site.

Quoting Google's support page "Remove a page or site from Google's search results":

If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.

One possible alternative solution is also mentioned in above document:

Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).