Kafka on Kubernetes multi-node

NegatioN picture NegatioN · Aug 21, 2015 · Viewed 10.5k times · Source

So my objective here is to set up a cluster of several kafka-brokers in a distributed fashion. But I can't see the way to make the brokers aware of each other.

As far as i understand, every broker needs a separate ID in their config, which I cannot guarantee or configure if I launch the containers from kubernetes?

They also need to have the same advertised_host?

Are there any parameters I'm missing that would need to be changed for the nodes to discover each other?

Would it be viable to do such a configuration at the end of the Dockerfile with a script? And/or a shared volume?

I'm currently trying to do this with the spotify/kafka-image which has a preconfigured zookeeper+kafka combination, on vanilla Kubernetes.

Answer

MrE picture MrE · Aug 26, 2015

My solution for this has been to use the IP as the ID: trim the dots and you get a unique ID that is also available outside of the container to other containers.

With a Service you can get access to the multiple containers's IPs (see my answer here on how to do this: what's the best way to let kubenetes pods communicate with each other?

so you can get their IDs too if you use IPs as the unique ID. The only issue is that IDs are not continuous or start at 0, but zookeeper / kafka don't seem to mind.

EDIT 1:

The follow up concerns configuring Zookeeper:

Each ZK node needs to know of the other nodes. The Kubernetes discovery service knowns of nodes that are within a Service so the idea is to start a Service with the ZK nodes.

This Service needs to be started BEFORE creating the ReplicationController (RC) of the Zookeeper pods.

The start-up script of the ZK container will then need to:

  • wait for the discovery service to populate the ZK Service with its nodes (that takes a few seconds, for now I just add a sleep 10 at the beginning of my startup script but more reliably you should look for the service to have at least 3 nodes in it.)
  • look up the containers forming the Service in the discovery service: this is done by querying the API. the KUBERNETES_SERVICE_HOST environment variable is available in each container. The endpoint to find service description is then

URL="http(s)://$USERNAME:$PASSWORD@${KUBERNETES_SERVICE_HOST/api/v1/namespaces/${NAMESPACE}/endpoints/${SERVICE_NAME}"

where NAMESPACE is default unless you changed it, and SERVICE_NAME would be zookeeper if you named your service zookeeper.

there you get the description of the containers forming the Service, with their ip in a "ip" field. You can do:

curl -s $URL | grep '\"ip\"' | awk '{print $2}' | awk -F\" '{print $2}' 

to get the list of IPs in the Service. With that, populate the zoo.cfg on the node using the ID defined above

You might need the USERNAME and PASSWORD to reach the endpoint on services like google container engine. These need to be put in a Secret volume (see doc here: http://kubernetes.io/v1.0/docs/user-guide/secrets.html )

You would also need to use curl -s --insecure on Google Container Engine unless you go through the trouble of adding the CA cert to your pods

Basically add the volume to the container, and look up the values from file. (contrary to what the doc says, DO NOT put the \n at the end of the username or password when base64 encoding: it just make your life more complicated when reading those)

EDIT 2:

Another thing you'll need to do on the Kafka nodes is get the IP and hostnames, and put them in the /etc/hosts file. Kafka seems to need to know the nodes by hostnames, and these are not set within service nodes by default

EDIT 3:

After much trial and thoughts using IP as an ID may not be so great: it depends on how you configure storage. for any kind of distributed service like zookeeper, kafka, mongo, hdfs, you might want to use the emptyDir type of storage, so it is just on that node (mounting a remote storage kind of defeats the purpose of distributing these services!) emptyDir will relaod with the data on the same node, so it seems more logical to use the NODE ID (node IP) as the ID, because then a pod that restarts on the same node will have the data. That avoid potential corruption of the data (if a new node starts writing in the same dir that is not actually empty, who knows what can happen) and also with Kafka, the topics being assigned a broker.id, if the broker id changes, zookeeper does not update the topic broker.id and the topic looks like it is available BUT points to the wrong broker.id and it's a mess.

So far I have yet to find how to get the node IP though, but I think it's possible to lookup in the API by looking up the service pods names and then the node they are deployed on.

EDIT 4

To get the node IP, you can get the pod hostname == name from the endpoints API /api/v1/namespaces/default/endpoints/ as explained above. then you can get the node IP from the pod name with /api/v1/namespaces/default/pods/

PS: this is inspired by the example in the Kubernetes repo (example for rethinkdb here: https://github.com/kubernetes/kubernetes/tree/master/examples/rethinkdb