I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me !
Here is my code:
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "master:9092,slave1:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "GROUP_2017",
"auto.offset.reset" -> "latest", //earliest or latest
"enable.auto.commit" -> (true: java.lang.Boolean)
)
what does it mean in my code?
I will explain to you the meaning, but I highly suggest to read Kafka Web Site Configuration
"bootstrap.servers" -> "master:9092,slave1:9092"
Essentially the Kafka cluster configuration: IP and Port.
"key.deserializer" -> classOf[StringDeserializer]
"value.deserializer" -> classOf[StringDeserializer]
This SO answer explain what is the purpose.
"group.id" -> "GROUP_2017"
A consumer process will belong to a groupId. A groupId can have multiple Consumers and Kafka will assign only one Consumer process to only one Partition (for data consuming). If the number of consumers is greater than the partitions available, then some processes will be idle.
"enable.auto.commit" -> (true: java.lang.Boolean)
Wether that flag is true, then Kafka is able to commit the message you brought from Kafka using Zookeeper to persist the last 'offset' which it read. This approach is not the best to use when you want a more robust solution for a production system, because does not ensure that the records you brought were correctly processed (using the logic you wrote in your code). If this flag is false, Kafka will not know which was the last offset read so when you restart the process, it will start reading the 'earliest' or the 'latest' offset depending on the value of your next flag (auto.offset.reset). Finally, This Cloudera article explains in details how to manage in a proper way the offsets.
"auto.offset.reset" -> "latest"
This flag tells Kafka where to start reading offsets in case you do not have any 'commit' yet. In others words, it will start either from the 'earliest' or from the 'latest' if you have not persisted any offset in Zookeeper yet (Manually or using enable.auto.commit flag).