I am running a Ruby on Rails app on a virtual Linux server that is capped at 1GB RAM. Currently, I am constantly hitting the limit and would like to optimize memory utilization. One option I am looking at is reducing the number of unicorn workers.
So what is the best way to determine the number of unicorn workers to use?
The current setting is 10 workers, but the maximum number of requests per second I have seen on Google Analytics Real-Time is 3 (only scored once at a peak time; in 99% of the time not going above 1 request per second).
So is it a save assumption that I can - for now - go with 4 workers, leaving room for unexpected amounts of requests? What are the metrics I should have a look at for determining the number of workers and what are the tools I can use for that on my Ubuntu machine?
The amount of workers you should use depends greatly on what your app itself is doing and how ofter it is doing those things. There really isn't a perfect ratio formula for it unfortunatly that will work in every single test case. This becomes even more true when you take into account the fact that you have a finite amount of RAM that you have to keep your server on.
Many will suggest that CPU Core Count + 1 but that isn't correct either. You will have to do test cases with different amounts of workers present and see how things go. Be sure to check the logs regularly.
In our team we use a program called Nagios: http://www.nagios.org
It works well and can check your server for many different things that can be happening and even alert you to them. This can help you a lot when trying to find the perfect balance.
Also, sometimes there are things that your server is doing besides just simply running your rails instances, it might be running scripts or processing information that isn't necessary. Make sure that your server isn't doing things it doesn't need to be doing, that way you save an many cpu cycles and RAM as possible.
Also, make sure that you are implementing this feature that Unicorn has, we do in our projects and it is invaluable:
Memory Growth When a worker is using too much memory, god or monit can send it a QUIT signal. This tells the worker to die after finishing the current request. As soon as the worker dies, the master forks a new one which is instantly able to serve requests. In this way we don’t have to kill your connection mid-request or take a startup penalty. -- Taken from: https://github.com/blog/517-unicorn
I also found this similar question that could give you some insight:
https://serverfault.com/q/369811/110682
I hope that helps.