Dead letter queue (DLQ) for Kafka with spring-kafka

Evgeniy Khyst picture Evgeniy Khyst · Mar 27, 2018 · Viewed 20.3k times · Source

What is the best way to implement Dead letter queue (DLQ) concept in Spring Boot 2.0 application using spring-kafka 2.1.x to have all messages that were failed to be processed by @KafkaListener method of some bean sent to some predefined Kafka DLQ topic and not lose the single message?

So consumed Kafka record is either:

  1. successfully processed,
  2. failed to be processed and is sent to the DLQ topic,
  3. failed to be processed, is not sent to the DLQ topic (due to the unexpected problem) so will be consumed by the listener again.

I tried to create listener container with the custom implementation of the ErrorHandler sending records failed to be processed to DLQ topic using KafkaTemplate. Using disabled auto-commit and RECORD AckMode.

spring.kafka.enable-auto-ack=false
spring.kafka.listener.ack-mode=RECORD

@Configuration
public class KafkaConfig {
    @Bean
    ConcurrentKafkaListenerContainerFactory<Integer, String> kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<Integer, String> factory = ...
        ...
        factory.getContainerProperties().setErrorHandler(dlqErrorHandler);
        return factory;
    }
}

@Component
public class DlqErrorHandler implements ErrorHandler {

    @Autowired
    private KafkaTemplate<Object, Object> kafkaTemplate;

    @Value("${dlqTopic}")
    private String dlqTopic;

    @Override
    public void handle(Exception thrownException, ConsumerRecord<?, ?> record) {
        log.error("Error, sending to DLQ...");
        kafkaTemplate.send(dlqTopic, record.key(), record.value());
    }
}

It seems that this implementation doesn't guarantee item #3. If an exception will be thrown in DlqErrorHandler record will not be consumed by the listener once again.

Will usage of the transactional listener container help?

factory.getContainerProperties().setTransactionManager(kafkaTransactionManager);

Is there any convenient way to implement DLQ concept using Spring Kafka?

UPDATE 28/03/2018

Thanks to Gary Russell's answer I was able to achieve the desired behavior by implementing DlqErrorHandler as follows

@Configuration
public class KafkaConfig {
    @Bean
    ConcurrentKafkaListenerContainerFactory<Integer, String> kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<Integer, String> factory = ...
        ...
        factory.getContainerProperties().setAckOnError(false);
        factory.getContainerProperties().setErrorHandler(dlqErrorHandler);
        return factory;
    }
}

@Component
public class DlqErrorHandler implements ContainerAwareErrorHandler {
    ...
    @Override
    public void handle(Exception thrownException, list<ConsumerRecord<?, ?> records, Consumer<?, ?> consumer, MessageListenerContainer container) {
        Consumerrecord<?, ? record = records.get(0);
        try {
            kafkaTemplate.send("dlqTopic", record.key, record.value());
            consumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset() + 1);
            // Other records may be from other partitions, so seek to current offset for other partitions too
            // ...
        } catch (Exception e) {
            consumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
            // Other records may be from other partitions, so seek to current offset for other partitions too
            // ...
            throw new KafkaException("Seek to current after exception", thrownException);
        }
    }
}

This way if consumer poll returns 3 records (1, 2, 3) and the 2nd one can't be processed:

  • 1 will be processed
  • 2 will fail to be processed and sent to the DLQ
  • 3 thanks to consumer seek to record.offset() + 1, it will be delivered to the listener

If sending to DLQ fails consumer seeks to the record.offset() and the record will be re-delivered to the listener (and sending to DLQ probably will be retired).

Answer

Gary Russell picture Gary Russell · Mar 27, 2018

See the SeekToCurrentErrorHandler.

When an exception occurs, it seeks the consumer so that all unprocessed records are redelivered on the next poll.

You can use the same technique (e.g. a subclass) to write to the DLQ and seek the current offset (and other unprocessed) if the DLQ write fails, and seek just the remaining records if the DLQ write succeeds.