Which method of caching is the fastest/lightest for Node/Mongo/NginX?

Maverick picture Maverick · Mar 21, 2013 · Viewed 7.5k times · Source

I've been tasked to work on a project for a client that has a site which he is estimating will receive 1-2M hits per day. He has an existing database of 58M users that need to get seeded on a per-registration basis for the new brand. Most of the site's content is served up from external API supplied data with most of the data stored on our Mongo setup being profile information and saved API parameters.

NginX will be on port 80 and load balancing to a Node cluster on ports 8000 - 8010.

My question is what to do about caching. I come from a LAMP background so I'm used to either writing static HTML files with PHP and serving those up to minimize MySQL load or using Memcached for sites that required a higher level of caching. This setup is a bit foreign to me.

Which is the most ideal as far as minimal response time and CPU load?

1: Page-level caching with NginX

Reference: http://andytson.com/blog/2010/04/page-level-caching-with-nginx/

server {
    listen            80;
    servername        mysite.com;

    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  Host       $host;

    location / {
        proxy_pass    http://localhost:8080/;
        proxy_cache   anonymous;
    }

    # don't cache admin folder, send all requests through the proxy
    location /admin {
        proxy_pass    http://localhost:8080/;
    }

    # handle static files directly. Set their expiry time to max, so they'll
    # always use the browser cache after first request
    location ~* (css|js|png|jpe?g|gif|ico)$ {
        root          /var/www/${host}/http;
        expires       max;
    }
}


2: Redis as a cache bucket

The hash() function is the numbers() function on this page: http://jsperf.com/hashing-strings

function hash(str) {
    var res = 0,
        len = str.length;
    for (var i = 0; i < len; i++) {
        res = res * 31 + str.charCodeAt(i);
    }
    return res;
}

var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key    = hash(apiUrl).toString(); // 1.8006908172911553e+136

myRedisClient.set(key,theJSONresponse, function(err) {...});


3: Node write JSON files

The hash() function is the numbers() function on this page: http://jsperf.com/hashing-strings

function hash(str) {
    var res = 0,
        len = str.length;
    for (var i = 0; i < len; i++) {
        res = res * 31 + str.charCodeAt(i);
    }
    return res;
}

var fs     = require('fs');
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key    = hash(apiUrl).toString(); // 1.8006908172911553e+136

fs.writeFile('/var/www/_cache/' + key + '.json', theJSONresponse, function(err) {...});


4: Varnish in front

I did some research and benchmarks like the ones shown on this site are leaning me away from this solution, but I'm still open to consider it if it makes the most sense: http://todsul.com/nginx-varnish

Answer

AlienWebguy picture AlienWebguy · Mar 21, 2013

I would do a combination, and use Redis to cache session user API calls that have a short TTL, and use Nginx to cache long term RESTless data and static assets. I wouldn't write JSON files as I imagine the file system IO would be the slowest and most CPU intensive of the options listed.