I've been working on getting some distributed tasks working via RabbitMQ.
I spent some time trying to get Celery to do what I wanted and couldn't make it work.
Then I tried using Pika and things just worked, flawlessly, and within minutes.
Is there anything I'm missing out on by using Pika instead of Celery?
What pika provides is just a small piece of what Celery is doing. Pika is Python library for interacting with RabbitMQ. RabbitMQ is a message broker; at its core, it just sends messages to/receives messages from queues. It can be used as a task queue, but it could also just be used to pass messages between processes, without actually distributing "work".
Celery implements an distributed task queue, optionally using RabbitMQ as a broker for IPC. Rather than just providing a way of sending messages between processes, it's providing a system for distributing actual tasks/jobs between processes. Here's how Celery's site describes it:
Task queues are used as a mechanism to distribute work across threads or machines.
A task queue’s input is a unit of work, called a task, dedicated worker processes then constantly monitor the queue for new work to perform.
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task a client puts a message on the queue, the broker then delivers the message to a worker.
A Celery system can consist of multiple workers and brokers, giving way to high availability and horizontal scaling.
Celery has a whole bunch of functionality built-in that is outside of pika's scope. You can take a look at the Celery docs to get an idea of the sort of things it can do, but here's an example:
>>> from proj.tasks import add
>>> res = add.chunks(zip(range(100), range(100)), 10)()
>>> res.get()
[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[20, 22, 24, 26, 28, 30, 32, 34, 36, 38],
[40, 42, 44, 46, 48, 50, 52, 54, 56, 58],
[60, 62, 64, 66, 68, 70, 72, 74, 76, 78],
[80, 82, 84, 86, 88, 90, 92, 94, 96, 98],
[100, 102, 104, 106, 108, 110, 112, 114, 116, 118],
[120, 122, 124, 126, 128, 130, 132, 134, 136, 138],
[140, 142, 144, 146, 148, 150, 152, 154, 156, 158],
[160, 162, 164, 166, 168, 170, 172, 174, 176, 178],
[180, 182, 184, 186, 188, 190, 192, 194, 196, 198]]
This code wants to add every x+y where x is in range(0, 100)
and y is in range(0,100)
. It does this by taking a task called add
, which adds two numbers, and distributing the work of adding 1+1
, 2+2
, 3+3
, etc, into chunks of 10, and distributing each chunk to as many Celery workers as there are available. Each worker will run add
on its 10 item chunk, until all the work is complete. Then the results are gathered up by the res.get()
call. I'm sure you can imagine a way to do this using pika, but I'm sure you can also imagine how much work would be required. You're getting that functionality out of the box with Celery.
You can certainly use pika to implement a distributed task queue if you want, especially if you have a fairly simple use-case. Celery is just providing a "batteries included" solution for task scheduling, management, etc. that you'll have to manually implement if you decide you want them with your pika solution.