Sitemap for a site with a large number of dynamic subdomains

bartekb picture bartekb · Oct 7, 2010 · Viewed 7.5k times · Source

I'm running a site which allows users to create subdomains. I'd like to submit these user subdomains to search engines via sitemaps. However, according to the sitemaps protocol (and Google Webmaster Tools), a single sitemap can include URLs from a single host only.

What is the best approach?

At the moment I've the following structure:

  1. Sitemap index located at example.com/sitemap-index.xml that lists sitemaps for each subdomain (but located at the same host).
  2. Each subdomain has its own sitemap located at example.com/sitemap-subdomain.xml (this way the sitemap index includes URLs from a single host only).
  3. A sitemap for a subdomain contains URLs from the subdomain only, i.e., subdomain.example.com/*
  4. Each subdomain has subdomain.example.com/robots.txt file:

--

User-agent: *
Allow: /

Sitemap: http://example.com/sitemap-subdomain.xml

--

I think this approach complies to the sitemaps protocol, however, Google Webmaster Tools give errors for subdomain sitemaps: "URL not allowed. This url is not allowed for a Sitemap at this location."

I've also checked how other sites do it. Eventbrite, for instance, produces sitemaps that contain URLs from multiple subdomains (e.g., see http://www.eventbrite.com/events01.xml.gz). This, however, does not comply with the sitemaps protocol.

What approach do you recommend for sitemaps?

Answer

Brian Armstrong picture Brian Armstrong · Dec 19, 2010

I recently struggled through this and finally got it working. See this thread for more details:

http://www.google.com/support/forum/p/Webmasters/thread?tid=53c3e4b3ab8d9503&hl=en&fid=53c3e4b3ab8d9503000497bd04ba63cf

Summary:

  • Use DNS verification to verify your site and all it's subdomains in one fell swoop
  • make the robots.txt on all your subdomains point to the main sitemap on your www domain
  • You may need to wait several days for Google to update it's cached copies of robot.txt on all your subdomains. It will still show errors until then.