Kafka optimal retention and deletion policy

Mohammad Ahmad picture Mohammad Ahmad · Feb 18, 2017 · Viewed 14.7k times · Source

I am fairly new to kafka so forgive me if this question is trivial. I have a very simple setup for purposes of timing tests as follows:

Machine A -> writes to topic 1 (Broker) -> Machine B reads from topic 1 Machine B -> writes message just read to topic 2 (Broker) -> Machine A reads from topic 2

Now I am sending messages of roughly 1400 bytes in an infinite loop filling up the space on my small broker very quickly. I'm experimenting with setting different values for log.retention.ms, log.retention.bytes, log.segment.bytes and log.segment.delete.delay.ms. First I set all of the values to the minimum allowed, but it seemed this degraded performance, then I set them to the maximum my broker could take before being completely full, but again the performance degrades when a deletion occurs. Is there a best practice for setting these values to get the absolute minimum delay?

Thanks for the help!

Answer

Abhimanyu picture Abhimanyu · Feb 19, 2017

Apache Kafka uses Log data structure to manage its messages. Log data structure is basically an ordered set of Segments whereas a Segment is a collection of messages. Apache Kafka provides retention at Segment level instead of at Message level. Hence, Kafka keeps on removing Segments from its end as these violate retention policies.

Apache Kafka provides us with the following retention policies -

  1. Time Based Retention

Under this policy, we configure the maximum time a Segment (hence messages) can live for. Once a Segment has spanned configured retention time, it is marked for deletion or compaction depending on configured cleanup policy. Default retention time for Segments is 7 days.

Here are the parameters (in decreasing order of priority) that you can set in your Kafka broker properties file:

Configures retention time in milliseconds

log.retention.ms=1680000

Used if log.retention.ms is not set

log.retention.minutes=1680

Used if log.retention.minutes is not set

log.retention.hours=168

  1. Size based Retention

In this policy, we configure the maximum size of a Log data structure for a Topic partition. Once Log size reaches this size, it starts removing Segments from its end. This policy is not popular as this does not provide good visibility about message expiry. However it can come handy in a scenario where we need to control the size of a Log due to limited disk space.

Here are the parameters that you can set in your Kafka broker properties file:

Configures maximum size of a Log

log.retention.bytes=104857600

So according to your use case you should configure log.retention.bytes so that your disk should not get full.