Balancing Kafka consumers

Roger Johansson picture Roger Johansson · Oct 30, 2016 · Viewed 7.3k times · Source

Let's say that I have 10 partitions for a given topic in Kafka. What would my options be to automatically load balance these 10 partitions between consumers?

I have read this post https://stackoverflow.com/a/28580363/317384 but I'm not sure it covers what I'm looking for, or maybe I'm just not getting it.

If I spin up a worker with one consumer for each partition, all work would be consumed by that worker.

But what happens if I spin up another instance of the same worker elsewhere? Will the client libraries/Kafka somehow detect this and re-balance the load between the two workers so that some of the active consumers on worker1 are now idle and the same consumers on worker2 becomes active?

I would like to be able to add and remove workers on demand, and spread the load across those, is that possible?

e.g. from this: enter image description here

to this: enter image description here

Answer

ashic picture ashic · Oct 30, 2016

Kafka consumers are part of consumer groups. A group has one or more consumers in it. Each partition gets assigned to one consumer. And partitions are how Kafka scales out. If you have more consumers than partitions, then some of your consumers will be idle. If you have more partitions than consumers, more than one partition may get assigned to a single consumer.

When a new consumer joins, a rebalance occurs, and the new consumer is assigned some partitions previously assigned to other consumers. In your case, if there were 10 partitions all being consumed by one consumer, and another consumer joins, there'll be a rebalance, and afterwards, there'll be (typically) five partitions per consumer.

It's worth noting that during a rebalance, the consumer group "pauses". A similar thing happens when consumers gracefully leave, or the leader detects that a consumer has left.