What's the exact reason to have heartbeat failure for group because it's rebalancing ? What's the reason for rebalance where all the consumers in group are up ?
Thank you.
Heartbeats are the basic mechanism to check if all consumers are still up and running. If you get a heartbeat failure because the group is rebalancing, it indicates that your consumer instance took too long to send the next heartbeat and was considered dead and thus a rebalance got triggered.
If you want to prevent this from happening, you can either increase the timeout (session.timeout.ms
), or make sure your consumer sends heartbeat more often (heartbeat.interval.ms
). Heartbeats are basically embedded in poll()
, thus, you need to make sure you call poll frequently enough. This can usually be achieved by limit the number of records a single poll returns via max.poll.records
(to shorten the time it takes to process all data that got fetched).
Update
Since Kafka 0.10.1, heartbeats are sent in a background thread, and not when poll()
is called (cf. https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread). In this new design, configuration session.timeout.ms
and heartbeat.interval.ms
are still the same. Additionally, there is max.poll.interval.ms
that determines how often poll()
must be called.
For more details, cf. Difference between session.timeout.ms and max.poll.interval.ms for Kafka 0.10.0.0 and later versions