How to improve SEO for single page application

Maciej Uruski picture Maciej Uruski · Nov 18, 2013 · Viewed 9.7k times · Source

We have built a search-engine for vacancies. For reasons of speed and a good user-experience, we used a the architecture of a “Single Page Application” (SPA). We know that for a SPA-architecture it is a challenge to enable SEO, so we did quite a lot of optimization to enable SEO. Although Google is indexing our pages, our ranking in Google is very poor and we are asking for suggestions to improve this. We followed Google’s recommendations, but without satisfaction.

A SPA cannot be indexed by Google-bot directly, since the Google bot will not execute client-side javascript. Without javascript our site does hardly contain any content because data is read asynchronically in json format and most of the HTML is rendered on the client . The rendering is done by a framework called “knockout” that enables databinding of HTML templates to javascript objects. Different pages in the SPA can be addressed using client-side url’s. In order to make those pages readable by Google, our client-side url’s do contain a ‘#’ followed by a ‘!’. This ‘hash-bang’ syntax triggers the Google bot to rewrite the url to a special ‘server-side’ url. When this special url is visited on our server , we trigger a ‘headless-browser’ to render the page on the server. The complete HTML-syntax (after javascript-execution) is then send to the client. This so called HTML-snapshot can be used by the Google bot to index the pages. In order to tell Google which pages are present in our SPA, we provided a sitemap.xml with the different url’s that can be visited. When we ask Google to show the pages that are indexed from our site, we see that the Google bot did visit and index our pages. So our conclusion is that technically we did our work well, but none of those pages seems to have a ranking high enough to appear in normal Google searches. We are not sure if this has a relation with the fact that we use a SPA architecture, but the result is that our pages cannot be found.

We are wondering if anybody has the same experience with this technique in relation to the Google ranking and if anybody has additional suggestions that can help us to improve the SEO ranking of our SPA (without completely changing the site to a conventional server-side-rendered technique).

Answer

Brigand picture Brigand · Nov 25, 2013

To tackle the problem you need a few things:

  1. Real URLs. Real <a> tags with hrefs pointing to these.
  2. You need to have the server generate the pages prefilled with the JSON on explicit request
    • this is most easily acomplished using PhantomJS or similar.
    • if you assume content is changed less frequently than its read (i.e. most successful sites), you can use a queue to build these pages into static files
    • then tell your web server to send index.html if the requested file doesn't exist

For "soft requests" (they click a link, which you use JSON/AJAX to cover), it'll work as it does currently.

For hard requests (they click a link from another site, press F5, or it's Googlebot crawling your URLs), you send them the precompiled version which:

  1. improves SEO
  2. increases page load performance (which is also a SEO bonus)
  3. doesn't require any difficult server processing because the page is already built