I am running a simple 3 node of kafka
and 5 node of zookeeper
to run the kafka
, I would like to know which is the good way of backup my kafka
, same for my zookeeper
.
For the moment I just export my data directory to a s3 bucket...
Thanks.
Zalando has recently published pretty good article how to backup Kafka and Zookeeper. Generally there are 2 paths for Kafka backup:
The preferred backup solution will depend on your use case. E.g. for streaming applications, first solution may give you less pain, while when using Kafka for event sourcing, the second solution may be more desirable.
Regarding Zookeeper, Kafka keeps there information about topics (persistent store), as well as for broker discovery and leader election (ephemeral). Zalando settled on using Burry, which simply iterates over Zookeeper tree structure, dumps it to file structure, to later zip it and push to cloud storage. It suffers from a little problem, but most probably it does not impact backup of Kafka's persistent data (TODO verify). Zalando describes there, that when restoring, it is better to first create Zookeeper cluster, then connect a new Kafka cluster to it (with new, unique broker IDs), and then restore Burry's backup. Burry will not overwrite existing nodes, not putting ephemeral information about old brokers, what is stored in backup.
Note: Although they mention usage of Exhibitor, it is not really needed for backup when backing up with Burry.