I want to stop search engines from crawling my whole website.
I have a web application for members of a company to use. This is hosted on a web server so that the employees of the company can access it. No one else (the public) would need it or find it useful.
So I want to add another layer of security (In Theory) to try and prevent unauthorized access by totally removing access to it by all search engine bots/crawlers. Having Google index our site to make it searchable is pointless from the business perspective and just adds another way for a hacker to find the website in the first place to try and hack it.
I know in the robots.txt
you can tell search engines not to crawl certain directories.
Is it possible to tell bots not to crawl the whole site without having to list all the directories not to crawl?
Is this best done with robots.txt
or is it better done by .htaccess or other?
It is best handled with a robots.txt
file, for just bots that respect the file.
To block the whole site add this to robots.txt
in the root directory of your site:
User-agent: *
Disallow: /
To limit access to your site for everyone else, .htaccess
is better, but you would need to define access rules, by IP address for example.
Below are the .htaccess
rules to restrict everyone except your people from your company IP:
Order allow,deny
# Enter your companies IP address here
Allow from 255.1.1.1
Deny from all