How to achieve delayed queue with apache kafka?

Marconi picture Marconi · Nov 12, 2014 · Viewed 15.2k times · Source

How do I add delayed jobs on kafka? As I understand it doesn't deal with per message but per topic. My jobs have varying schedule in which I would like them to be consumed. Say one will be in the next 4 hours, another would be i Dec. 1, etc.

Does kafka have native support for this or other 3rd party ways to achieve the same?

I'm thinking of using Redis for the delayed queue instead, and push the job to kafka once its schedule has arrived but if possible I'd like to use only one dependency.

Answer

mjuarez picture mjuarez · Apr 20, 2017

A bit of a delayed answer here. It's now possible in the latest Kafka version 0.10+ to consume from a delayed stream, using the new timestamp per message. I'm using this right now in order to implement a continuous aggregating dataset, without resorting to external dependencies.

These records come through, and may have updates/deletes coming through within the next 60 minutes after the first event, so I can't declare one as "final" until I have seen all the updates.

So, to handle this case, I'm consuming the topic with all CREATEs/UPDATEs/DELETEs twice, the first one in realtime (or as fast as possible), the second one delayed by 90 mins to ensure I don't miss anything. On the realtime consumer, I'm storing locally all the needed updates for the create. Then on the delayed consumer, when I receive a particular "CREATE", I'll go lookup my local storage for any updates/deletes, update the record so it knows it's final status, and produce it into a final topic into Kafka again.

To ensure I don't run out of disk space, I'm also continuously truncating the local storage so it holds at most two hours of updates/deletes.