I am using Kafka 0.8.1 and Kafka python-0.9.0. In my setup, I have 2 kafka brokers setup. When I run my kafka consumer, I can see it retrieving messages from the queue and keeping track of offsets for both the brokers. Everything works great!
My issue is that when I restart the consumer, it starts consuming messages from the beginning. What I was expecting was that upon restart, the consumer would start consuming messages from where it left off before it died.
I did try keeping track of the message offsets in Redis and then calling consumer.seek before reading a message from the queue to ensure that I was only getting the messages that I hadn't seen before. While this worked, before deploying this solution, I wanted to check with y'all ... perhaps there is something I am misunderstanding about Kafka or the python-Kafka client. Seems like the consumer being able to restart reading from where it left off is pretty basic functionality.
Thanks!
Take care with the kafka-python library. It has a few minor issues.
If speed is not really a problem for your consumer you can set the auto-commit in every message. It should works.
SimpleConsumer provides a seek
method (https://github.com/mumrah/kafka-python/blob/master/kafka/consumer/simple.py#L174-L185) that allows you to start consuming messages in whatever point you want.
The most usual calls are:
consumer.seek(0, 0)
to start reading from the beginning of the queue.consumer.seek(0, 1)
to start reading from current offset.consumer.seek(0, 2)
to skip all the pending messages and start reading only new messages.The first argument is an offset to those positions. In that way, if you call consumer.seek(5, 0)
you will skip the first 5 messages from the queue.
Also, don't forget, the offset is stored for consumer groups. Be sure you are using the same one all the time.