What do multi-processes VS multi-threaded servers most benefit from?

CodeOverload picture CodeOverload · Sep 2, 2013 · Viewed 8.1k times · Source

Can anyone explain what's the bottleneck of each concurrency method?

Servers like Unicorn (process based) an Puma (thread based).

Does each method prefer CPU cores? threads? or simply clock speed? or a special combination?

How to determine the optimal CPU characteristics needed in case of using dedicated servers?

and how to determine the best workers amount in the case of Unicorn, or threads amount in the case of Puma?

Answer

Stuart Nelson picture Stuart Nelson · Sep 2, 2013

Unicorn is process based, which means that each instance of ruby will have to exist in its own process. That can be in the area of 500mb's for each process, which will quickly drain system resources. Puma, being thread based, won't use the same amount of memory to THEORETICALLY attain the same amount of concurrency.

Unicorn, being that multiple processes are run, will have parallelism between the different processes. This is limited by your CPU cores (more cores can run more processes at the same time), but the kernel will switch between active processes so more than 4 or 8 processes (however many cores you have) can be run. You will be limited by your machine's memory. Until recently, ruby was not copy-on-write friendly, which meant that EVERY process had its own inherited memory (unicorn is a preforking server). Ruby 2.0 is copy-on-write friendly, which could mean that unicorn won't actually have to load all of the children processes in memory. I'm not 100% clear on this. Read about copy on write, and check out jessie storimer's awesome book 'working with unix processes'. I'm pretty sure he covered it in there.

Puma is a threaded server. MRI Ruby, because of the global interpreter lock (GIL), can only run a single CPU bound task at a time (cf. ruby tapas episode 127, parallel fib). It will context switch between the threads, but as long as it is a CPU bound task (e.g. data processing) it will only ever run a single thread of execution. This gets interesting if you run your server with a different implementation of Ruby, like JRuby or Rubinius. They do not have the GIL, and can process a great deal of information in parallel. JRuby is pretty speedy, and while Rubinius is slow compared to MRI, multithreaded Rubinius processes data faster than MRI. During non-blocking IO, however, (e.g. writing to a database, making a web request), MRI will context switch to a non-executing thread and do work there, and then switch back to the previous thread when information has been returned.

For Unicorn, I would say the bottleneck is memory and clock speed. For Puma, I would say the bottleneck is your choice of interpreter (MRI vs Rubinius or JRuby) and the type of work your server is doing (lots of cpu bound tasks vs non-blocking IO).

There are tons of great resources on this debate. Check out Jessie Storimer's books on these topics, working with ruby threads and working with unix processes; read this quick summary of preforking servers by ryan tomayko, and google around for more info.

I don't know what the best worker amount is for Unicorn or Puma in your case. The best thing to do is run performance tests and do what is right for you. There is no one size fits all. (although I think the puma standard is to use a pool of 16 threads and lock it at that)