Docker/Kubernetes + Gunicorn/Celery - Multiple Workers vs Replicas?

rtindru picture rtindru · Jul 31, 2018 · Viewed 7.1k times · Source

I was wondering what the correct approach to deploying a containerized Django app using gunicorn & celery was.

Specifically, each of these processes has a built-in way of scaling vertically, using workers for gunicorn and concurrency for celery. And then there is the Kubernetes approach to scaling using replicas

There is also this notion of setting workers equal to some function of the CPUs. Gunicorn recommends

2-4 workers per core

However, I am confused what this translates to on K8s where CPU is a divisible shared resource - unless I use resoureceQuotas.

I want to understand what the Best Practice is. There are three options I can think of:

  • Have single workers for gunicorn and a concurrency of 1 for celery, and scale them using the replicas? (horizontal scaling)
  • Have gunicorn & celery run in a single replica deployment with internal scaling (vertical scaling). This would mean setting fairly high values of workers & concurrency respectively.
  • A mixed approach between 1 and 2, where we run gunicorn and celery with a small value for workers & concurrency, (say 2), and then use K8s Deployment replicas to scale horizontally.

There are some questions on SO around this, but none offer an in-depth/thoughtful answer. Would appreciate if someone can share their experience.

Note: We use the default worker_class sync for Gunicorn

Answer

stacksonstacks picture stacksonstacks · Aug 16, 2018

These technologies aren't as similar as they initially seem. They address different portions of the application stack and are actually complementary.

Gunicorn is for scaling web request concurrency, while celery should be thought of as a worker queue. We'll get to kubernetes soon.


Gunicorn

Web request concurrency is primarily limited by network I/O or "I/O bound". These types of tasks can be scaled using cooperative scheduling provided by threads. If you find request concurrency is limiting your application, increasing gunicorn worker threads may well be the place to start.


Celery

Heavy lifting tasks e.g. compress an image, run some ML algo, are "CPU bound" tasks. They can't benefit from threading as much as more CPUs. These tasks should be offloaded and parallelized by celery workers.


Kubernetes

Where Kubernetes comes in handy is by providing out-of-the-box horizontal scalability and fault tolerance.

Architecturally, I'd use two separate k8s deployments to represent the different scalablity concerns of your application. One deployment for the Django app and another for the celery workers. This allows you to independently scale request throughput vs. processing power.

I run celery workers pinned to a single core per container (-c 1) this vastly simplifies debugging and adheres to Docker's "one process per container" mantra. It also gives you the added benefit of predictability, as you can scale the processing power on a per-core basis by incrementing the replica count.

Scaling the Django app deployment is where you'll need to DYOR to find the best settings for your particular application. Again stick to using --workers 1 so there is a single process per container but you should experiment with --threads to find the best solution. Again leave horizontal scaling to Kubernetes by simply changing the replica count.

HTH It's definitely something I had to wrap my head around when working on similar projects.