What is the difference between Workers and Threads in Puma

Nick Ginanto picture Nick Ginanto · Jun 18, 2014 · Viewed 28.4k times · Source

What is the difference between a puma worker and a puma thread in context of a heroku dyno?

What I know (please correct me if I am wrong):

  • Thin is not concurrent, so a web process can only do one request at a time

  • In unicorn, I know I can have several unicorn workers in one process to add concurrency.

But in puma there is threads and workers.. Isn't a worker a thread inside the puma process?

Can I use more workers/threads to add web concurrency in Heroku?

Answer

robert_murray picture robert_murray · Apr 21, 2015

As the other answer states, this Heroku article is pretty good with explanations of certain configuration items.

However if you need to tune your application on Heroku, or anywhere, then it pays to know how things work.

I think you are almost correct when you say "a worker is a thread inside the puma process", I believe a worker is an operating system level process forked from puma which then can use threads internally.

As far as I understand - puma will fork its operating system process however many times you set via workers configuration to respond to http requests. This gives you parallelism in terms of handling multiple requests but this will usually take up more memory as it will 'copy' your application code for each worker.

Each puma worker will then use multiple threads within its OS process depending on the threads configuration. These add concurrency by allowing the puma process to respond to multiple requests itself so that if one thread is blocked, ie processing a request, it can handle a new request with another thread. As stated, this requires your entire application to be threadsafe so that, for example any global configuration from one request does not 'leak' into another.

You would tune puma so that the number of workers was adequate for the number of CPUs and memory available and then tune the threads dependant on how much you would want to saturate the host running your application and how your application behaves - more does not always equal faster/more request throughput.