Simple Pull Message Queue

Daniel picture Daniel · Aug 29, 2013 · Viewed 11.5k times · Source

I'm trying to find the right tool for the job. I've explored a few different message queues like Kafka, Kestrel, etc... and I'm looking for something that has a PULL functionality.

I have an API (distributed) that shoves the incoming messages into the queue. I'd then have workers (separate machines) that pull messages from the queue. This ensures that the workers don't get flooded and can't handle the load of the queue.

I'm wondering if Kafka or Kestrel supports this type of functionality

Answer

user2720864 picture user2720864 · Aug 29, 2013

Kafka does work on the push - pull basic and capable of handling large scale real time streams. Also as mentioned in their documentation Kafka's performance is effectively constant with respect to data size so retaining lots of data will not be a problem.

For processing stream Checkout Storm. Its free , fault-tolerant , distributed real time computation system and very easy to scale. It does what exactly you've mentioned (running workers on separate machines). And it also suppport transactional topologies. On top of that it has a very nice integration with Apache Kafka.

For more on storm check here

So typically what you can do is retrieve message from Kafka queue using their consume API and then feed it to a storm cluster to do the rest in a distributed manner. Kafka 0.8 provides 2 types of API,

  • High Level or consumer group
  • Low level or Simple consumer API

The former provides a high level abstraction for consuming data and takes care of lot of things like threading, error handling, while the later allows a much greater control over message handling like reading a message multiple times, message transaction etc.

High level consumer API example

Simple Consumer example