While polling Kafka, I have subscribed to multiple topics using the subscribe()
function. Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek()
and poll()
from a topic. Will calling seek()
iteratively over each of the topic names, before polling for data achieve the result?
How are the offsets exactly stored in Kafka?
I have one partition per topic and just one consumer to read from all topics.
How does Kafka store offsets for each topic?
Kafka has moved the offset storage from zookeeper to kafka brokers. The reason is below:
Zookeeper is not a good way to service a high-write load such as offset updates because zookeeper routes each write though every node and hence has no ability to partition or otherwise scale writes. We have always known this, but chose this implementation as a kind of "marriage of convenience" since we already depended on zk.
Kafka store the offset commits in a topic, when consumer commit the offset, kafka publish an commit offset message to an "commit-log" topic and keep an in-memory structure that mapped group/topic/partition to the latest offset for fast retrieval. More design infomation could be found in this page about offset management.
Now, I want to set the offset from which I want to read from each topic, without resubscribing after every seek() and poll() from a topic.
There is a new feature about kafka admin tools to reset offset.
kafka-consumer-group.sh --bootstrap-server 127.0.0.1:9092 --group
your-consumer-group **--reset-offsets** --to-offset 1 --all-topics --execute
There are more options you can use.