Google Analytics proxy

Tony Gustafsson picture Tony Gustafsson · Mar 18, 2015 · Viewed 9.5k times · Source

I have a special situation where the sites visitors can access the page from a certain domain but no others. So HTML and assets are no problem as long as they are stored on the server. Google Analytics on the other hand requires a download of analytics.js from Googles servers, which is impossible.

So I'm looking for a way to proxy this. The webserver itself has internet access and could relay the trafic. To report to Google about my page view, a single pixel GIF is downloaded from Google, described here: https://developers.google.com/analytics/resources/concepts/gaConceptsTrackingOverview

I think it would be kind of easy to get all the parameters in the GIF and use the measurement protocol to report to Google from the server - but the hard bit is to get all this info to the server. To download analytics.js and modify it to go to my own server seems to me as a hack that ain't future proof at all. To just get the current page from the user to the server is not a big deal, but we would like to get the user id, browser version and everything you get with Analytics.

How would you do it? Do you find a solution for this?

Answer

Eike Pierstorff picture Eike Pierstorff · Mar 18, 2015

As pointed out in my comment the utm.gif is no longer used. Google Analytics has completely switched to the Measurement Protocol and data is now sent to the Endpoint for the Measurement Protocol at google-analytics.com/collect. Actually this still return a transparent pixel since calling an image with parameters is a probate way of transmitting informations across domain boundaries.

Now, you could just the Measurement Protocol to implement your own Google Analytics tracker.

To quote myself:

Each calls includes at least the ID of the account you want to send data to, a client id that allows to group interactions into sessions (so it should be unique per visitor, but it must not identify a user personally), an interaction type (pageview, event, timing etc., some interactions types require additional parameters) and the version of the protocol you are using (at the moment there is only one version).

So the most basic example to record a pageview would look like this:

www.google-analytics.com/collect/v=1&tid=UA-XXXXY&cid=555&t=pageview&dp=%2Fmypage

You probably would want to add the users IP (will be anonymized automatically) and the user agent.

However it sounds like you prefer to use the standard Analytics code to collect the data and relay the tracking call via your own server. While I haven't used the following in production I don't see any reason why it wouldn't work.

First you need the analytics.js file. Self-hosting the file is discouraged, but the given reason is that the code is updated sometimes by Google and if you host it yourself you might miss the updates. This can be remedied by setting up a cron job that downloads the file regularly to your server so you always have a current version.

Next you'd adapt the GA bootstrap function to load the code from your own server:

  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','//www.myserver.com/analytics.js','ga');

Now you have the code, but the tracking call will still be sent to the Analytics Server (i.e. in your case it won't be sent at all). So you need to re-route the call via your server.

To make this possible the Google (Universal) Analytics Code has a feature called "tasks". Tasks are functions within the tracking code in which the tracking call is being assembled.

It is possible to modify tasks by using the "set" function of the tracker object, using the taskname as parameter and passing a function that overwrites/overloads the task function.

The following is pretty much the example from the Google documentation (except I omitted the part where data is still being sent to Google - you don't need this at this point):

ga('create', 'UA-XXXXX-Y', 'auto');

ga(function(tracker) {

  tracker.set('sendHitTask', function(model) {
    var payLoad = model.get('hitPayload');
    var gifRequest = new XMLHttpRequest();
    var gifPath = "/__ua.gif";
    gifRequest.open('get', gifPath + '?' + payLoad, true);
    gifRequest.send();
  });
});

ga('send', 'pageview');   

Now this sends the data to a file called __ua.gif at your own server (if you need to send data cross-domain you can simply do a var ua = new Image; ua.src = gifPath + '?' + payLoad to create an image request).

The model parameter to the sendHitTask-function contains (apart from a lot of overhead) the payload, that is the assembled query string that contains the analytics data. You can then make your _ua.gif a script that proxies the request to the google-analytics.com/collect.

At this point the user agent will be your script and the IP adress will be that of your server, so you need to include &uip (User IP override) and &ua (User agent override) parameters ( https://groups.google.com/forum/#!msg/google-analytics-measurement-protocol/8TAp7_I1uTk/KNjI5IGwT58J) to get geo and technical information.

If you are feeling more adventurous you can override the buildHitTask instead and try and add the additional parameters there (more hassle probably since you'd need to get the IP address from somewhere).

For additional parameter see the reference for analytics.js and the Measurement Protocol.