Is Zookeeper a must for Kafka?

Paaji picture Paaji · May 20, 2014 · Viewed 95.3k times · Source

In Kafka, I would like to use only a single broker, single topic and a single partition having one producer and multiple consumers (each consumer getting its own copy of data from the broker). Given this, I do not want the overhead of using Zookeeper; Can I not just use the broker only? Why is a Zookeeper must?

Answer

John Petrone picture John Petrone · May 23, 2014

Yes, Zookeeper is required for running Kafka. From the Kafka Getting Started documentation:

Step 2: Start the server

Kafka uses zookeeper so you need to first start a zookeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node zookeeper instance.

As to why, well people long ago discovered that you need to have some way to coordinating tasks, state management, configuration, etc across a distributed system. Some projects have built their own mechanisms (think of the configuration server in a MongoDB sharded cluster, or a Master node in an Elasticsearch cluster). Others have chosen to take advantage of Zookeeper as a general purpose distributed process coordination system. So Kafka, Storm, HBase, SolrCloud to just name a few all use Zookeeper to help manage and coordinate.

Kafka is a distributed system and is built to use Zookeeper. The fact that you are not using any of the distributed features of Kafka does not change how it was built. In any event there should not be much overhead from using Zookeeper. A bigger question is why you would use this particular design pattern -- a single broker implementation of Kafka misses out on all of the reliability features of a multi-broker cluster along with it's ability to scale.