I'm a bit confused about where offsets are stored when using Kafka and Zookeeper. It seems like offsets in some cases are stored in Zookeeper, in other cases they are stored in Kafka.
What determines whether the offset is stored in Kafka or in Zookeeper? And what the pros and cons?
NB: Of course I could also store the offset on my own in some different data store but that is not part of the picture for this post.
Some more details about my setup:
Older versions of Kafka (pre 0.9) store offsets in ZK only, while newer version of Kafka, by default store offsets in an internal Kafka topic called __consumer_offsets
(newer version might still commit to ZK though).
The advantage of committing offsets to the broker is, that the consumer does not depend on ZK and thus clients only need to talk to brokers which simplifies the overall architecture. Also, for large deployments with a lot of consumers, ZK can become a bottleneck while Kafka can handle this load easily (committing offsets is the same thing as writing to a topic and Kafka scales very well here -- in fact, by default __consumer_offsets
is created with 50 partitions IIRC).
I am not familiar with NodeJS or kafka-node -- it depend on the client implementation how offsets are committed.
Long story short: if you use brokers 0.10.1.0
you could commit offsets to topic __consumer_offsets
. But it depends on your client, if it implements this protocol.
In more detail, it depends on your broker and client version (and which consumer API you are using), because older clients can talk to newer brokers. First, you need to have broker and client version 0.9
or larger to be able to write offsets into the Kafka topics. But if an older client is connecting to a 0.9
broker, it will still commit offsets to ZK.
For Java consumers:
It depends what consumer are using: Before 0.9 there are two "old consumer" namely "high level consumer" and "low level consumer". Both, commit offsets directly to ZK. Since 0.9
, both consumers got merged into single consumer, called "new consumer" (it basically unifies low level and high level API of both old consumers -- this means, in 0.9
there a three types of consumers). The new consumer commits offset to the brokers (ie, the internal Kafka topic)
To make upgrading easier, there is also the possibility to "double commit" offsets using old consumer (as of 0.9
). If you enable this via dual.commit.enabled
, offsets are committed to ZK and the __consumer_offsets
topic. This allows you to switch from old consumer API to new consumer API while moving you offsets from ZK to __consumer_offsets
topic.