Kafka console consumer ERROR "Offset commit failed on partition"

Yeming Huang picture Yeming Huang · May 2, 2018 · Viewed 27.2k times · Source

I am using a kafka-console-consumer to probe a kafka topic.

Intermittently, I am getting this error message, followed by 2 warnings:

[2018-05-01 18:14:38,888] ERROR [Consumer clientId=consumer-1, groupId=console-consumer-56648] Offset commit failed on partition my-topic-0 at offset 444: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

[2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Asynchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=444, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

[2018-05-01 18:14:38,888] WARN [Consumer clientId=consumer-1, groupId=console-consumer-56648] Synchronous auto-commit of offsets {my-topic-0=OffsetAndMetadata{offset=447, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

It suggested in the warn logs that:

This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.

So, I either need to increase max.poll.interval.ms or decrease max.poll.records.

Please advise what would be the implication of each method, and which one is preferred on a different situation?

Answer

Nathan Walther picture Nathan Walther · Jul 6, 2018

If you increase max.poll.interval.ms that says “it’s ok to spend time processing a large batch of records” and you’ll gain throughput if you can process larger batches more efficiently than smaller ones.

To decrease max.poll.records says ”take fewer records so there’s enough time to process them” and would favor latency over throughput.

Also consider that both are configured fine, but something else is causing performance issues within your poll loop. I would explore that first before changing configuration so you don’t mask a bigger problem.