Google bot crawling on AngularJS site with HTML5 Mode routes

Iraklis Alexopoulos picture Iraklis Alexopoulos · Jun 27, 2014 · Viewed 15k times · Source

We have an AngularJS site using HTML5 routes. I just did some test "Fetch as Google" runs. The results are a bit confusing:

However, we are already prepared for Google to not be able to crawl our site, so we have already added , so the Google bot revisits our page with “?_escaped_fragment_=". We followed this, https://developers.google.com/webmasters/ajax-crawling/docs/getting-started (section "3. Handle pages without hash fragments"). In our Nginx config we have something like this:

if ($args ~ "_escaped_fragment_=") {
    serve the static HTML snapshots
}

, and indeed it works fine, if we pass the _escaped_fragment_= ourselves. However, the Google bot never tried to crawl our site with this param, so it never crawled the snapshot. Are we missing something? Should we also add agent detection for Google bot on our Nginx conf? Something like this?

if ($http_user_agent ~* "googlebot|yahoo|bingbot|baiduspider|yandex|yeti|yodaobot|gigabot|ia_archiver|facebookexternalhit|twitterbot|developers\.google\.com") {            

server from snapshots

}

It would be great if we can understand this better, thank you so much in advance!

UPDATE:
I just read this, http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io?_escaped_fragment_=tag#caveats. So, it seems that when using the manual tools (Fetch as Google), we should pass ourselves either #! or ?_escaped_fragment_= in the right place. Indeed, if I pass ?_escaped_fragment_= in our case, I do see the HTML snapshot that we have created.

Is that true? Is this how it works indeed?

UPDATE 2 On the bottom of this thread, a Google employee verifies that for Google Webmasters "Fetch as Google", you need to manually pass the _escaped_fragment_= param yourself, https://productforums.google.com/forum/#!msg/webmasters/fZjdyjq0n98/PZ-nlq_2RjcJ

Cheers,
Iraklis

Answer

Katalyst picture Katalyst · Dec 18, 2014

I will try to answer your questions based on our experiences in the last month of developing a SPA with HTML5 mode.

How do I get Googlebot to use ?_escaped_fragment_= instead of the direct links.

This is actually quite simple but easy to overlook. In fact, there are two different ways to get Googlebot to try the escaped_fragment. The first method is to run your site in non-html5 mode. This means that your URLs will be of the form:

http://my.domain.com/base/#!some/path/on/website

Googlebot recognizes the #! and makes a second call to your server with an altered URL:

http://my.domain.com/base/?_escaped_fragment_=some/path/on/website

Which you can then handle as you wish. The second way to get Googlebot to try _escaped_fragment_ mode is to include the following meta tag on the index page you supply to the bot:

<meta name="fragment" content="!">

This will make googlebot check the other version of the webpage every time it sees the tag. Interestingly you can use both these techniques together or you can do what we ended up doing, which is running in html5 mode with the meta tag. This means that your URLs will be escaped as follows:

http://my.domain.com/base/some/path/on/website?_escaped_fragment_=

Interestingly, the bot will not put anything at the end of the fragment. But depending on what webserver you are running, you can easily map this with a pattern matching the "_escaped_fragment_" text to your alternate bot page. For more information on the escaped fragment go here.

"Fetch as Googlebot" returns two different versions of my page, the source with {{}} and the rendered page looking correct. What does that mean?

Google's Bots can actually interpret JavaScript to a limited extent since early 2014. For more information, read the official blog entry on google webmasters here. However, as is made clear in the blog entry, this comes with a lot of caveats. For instance:

  1. Googlebot does not guarantee to execute all javascript code.
  2. Googlebot will attempt to find links in the javascript to follow and use them to help find more pages.
  3. Googlebot will render the preview in webmasters tools by executing as much of the javascript as it can (thus the lack of {{}} in the rendered version).
  4. Googlebot will not necessarily use the rendered version in order to build the meta information about your site for its index.

As of 18/12/2014, we are still unsure if Googlebot can actually extract any information from an SPA in rendered mode for its index beyond finding links to follow in the javascript. In our experience, Googlebot will include {{}} in its index listing so that when you try to use {{}} to fill meta information (description, keywords, title, etc...) your site looks like this in Google Search results:

{{meta.siteTitle}}
http://my.domain.com/base/some/path/on/website
{{meta.description}}

rather than what you expect which might look like this:

Domain
http://my.domain.com/base/some/path/on/website
This is a random page on my domain. An excellent example page to be sure!