What does "Rebalancing" mean in Apache Kafka context?

Jeff Gong picture Jeff Gong · Jun 22, 2015 · Viewed 55.7k times · Source

I am a new user to Kafka and have been trialling it for about 2-3 weeks now. I believe at the moment I have a good understand of how Kafka works for the most part, but after attempting to fit the API for my own Kafka consumer (this is obscure but I'm following the guidelines for the new KafkaConsumer that is supposed to be available for v 0.9, which is out on the 'trunk' repo atm) I've had latency issues consuming from a topic if I have multiple consumers with the same groupID.

In this setup, my console consistently logs issues regarding a 'rebalance triggering'. Do rebalances occur when I add new consumers to a consumer group and are they triggered in order to figure out which consumer instance in the same groupID will get which partitions or are rebalances used for something else entirely?

I also came across this passage from https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+Rewrite+Design and I just can't seem to understand it, so if someone could help me make sense of it that would be much appreciated:

Rebalancing is the process where a group of consumer instances (belonging to the same group) co-ordinate to own a mutually exclusive set of partitions of topics that the group is subscribed to. At the end of a successful rebalance operation for a consumer group, every partition for all subscribed topics will be owned by a single consumer instance within the group. The way rebalancing works is as follows. Every broker is elected as the coordinator for a subset of the consumer groups. The co-ordinator broker for a group is responsible for orchestrating a rebalance operation on consumer group membership changes or partition changes for the subscribed topics. It is also responsible for communicating the resulting partition ownership configuration to all consumers of the group undergoing a rebalance operation.

Answer

George Davis picture George Davis · Jun 22, 2015

When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.

the command for this is: rebalance.max.retries and is set to 4 by default.

also, it might be happening if the following is true:

ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.

Hope this helps!