I've been tasked to work on a project for a client that has a site which he is estimating will receive 1-2M hits per day. He has an existing database of 58M users that need to get seeded on a per-registration basis for the new brand. Most of the site's content is served up from external API supplied data with most of the data stored on our Mongo setup being profile information and saved API parameters.
NginX will be on port 80 and load balancing to a Node cluster on ports 8000 - 8010.
My question is what to do about caching. I come from a LAMP background so I'm used to either writing static HTML files with PHP and serving those up to minimize MySQL load or using Memcached for sites that required a higher level of caching. This setup is a bit foreign to me.
Which is the most ideal as far as minimal response time and CPU load?
Reference: http://andytson.com/blog/2010/04/page-level-caching-with-nginx/
server {
listen 80;
servername mysite.com;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
location / {
proxy_pass http://localhost:8080/;
proxy_cache anonymous;
}
# don't cache admin folder, send all requests through the proxy
location /admin {
proxy_pass http://localhost:8080/;
}
# handle static files directly. Set their expiry time to max, so they'll
# always use the browser cache after first request
location ~* (css|js|png|jpe?g|gif|ico)$ {
root /var/www/${host}/http;
expires max;
}
}
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
myRedisClient.set(key,theJSONresponse, function(err) {...});
The hash()
function is the numbers()
function on this page: http://jsperf.com/hashing-strings
function hash(str) {
var res = 0,
len = str.length;
for (var i = 0; i < len; i++) {
res = res * 31 + str.charCodeAt(i);
}
return res;
}
var fs = require('fs');
var apiUrl = 'https://www.myexternalapi.com/rest/someparam/someotherparam/?auth=3dfssd6s98d7f09s8df98sdef';
var key = hash(apiUrl).toString(); // 1.8006908172911553e+136
fs.writeFile('/var/www/_cache/' + key + '.json', theJSONresponse, function(err) {...});
I did some research and benchmarks like the ones shown on this site are leaning me away from this solution, but I'm still open to consider it if it makes the most sense: http://todsul.com/nginx-varnish
I would do a combination, and use Redis to cache session user API calls that have a short TTL, and use Nginx to cache long term RESTless data and static assets. I wouldn't write JSON files as I imagine the file system IO would be the slowest and most CPU intensive of the options listed.