Backup/restore kafka and zookeeper

starttter picture starttter · Dec 13, 2017 · Viewed 12.9k times · Source

I am running a simple 3 node of kafka and 5 node of zookeeper to run the kafka, I would like to know which is the good way of backup my kafka, same for my zookeeper.

For the moment I just export my data directory to a s3 bucket...

Thanks.

Answer

krzychu picture krzychu · Jan 19, 2018

Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:

  • Maintain second Kafka cluster, to which all topics get replicated. I haven't verified this setup, but if offset topics are also replicated, then switching to another cluster shouldn't harm consumers' processing state.
  • Dump topics to cloud storage, e.g. using S3 connector (as described by Zalando). In case of restore, you recreate topics and feed it with data from your cloud storage. This would allow you to make point-in-time restore, but consumers would have to start reading from topic from the beginning.

The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.

Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.

Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.