I have www.domainname.com, origin.domainname.com pointing to the same codebase. Is there a way, I can prevent all urls of basename origin.domainname.com from getting indexed.
Is there some rule in robot.txt to do it. Both the urls are pointing to the same folder. Also, I tried redirecting origin.domainname.com to www.domainname.com in htaccess file but it doesnt seem to work..
If anyone who has had a similar kind of problem and can help, I shall be grateful.
Thanks
You can rewrite robots.txt
to an other file (let's name this 'robots_no.txt' containing:
User-Agent: *
Disallow: /
(source: http://www.robotstxt.org/robotstxt.html)
The .htaccess file would look like this:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example.com$
RewriteRule ^robots.txt$ robots_no.txt
Use customized robots.txt for each (sub)domain:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^sub.example.com$ [OR]
RewriteCond %{HTTP_HOST} ^example.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.example.org$ [OR]
RewriteCond %{HTTP_HOST} ^example.org$
# Rewrites the above (sub)domains <domain> to robots_<domain>.txt
# example.org -> robots_example.org.txt
RewriteRule ^robots.txt$ robots_${HTTP_HOST}.txt [L]
# in all other cases, use default 'robots.txt'
RewriteRule ^robots.txt$ - [L]
Instead of asking search engines to block all pages on for pages other than www.example.com
, you can use <link rel="canonical">
too.
If http://example.com/page.html
and http://example.org/~example/page.html
both point to http://www.example.com/page.html
, put the next tag in the <head>
:
<link rel="canonical" href="http://www.example.com/page.html">