How to expose a headless service for a StatefulSet externally in Kubernetes

Nadir Muzaffar picture Nadir Muzaffar · Sep 27, 2017 · Viewed 21.7k times · Source

Using kubernetes-kafka as a starting point with minikube.

This uses a StatefulSet and a headless service for service discovery within the cluster.

The goal is to expose the individual Kafka Brokers externally which are internally addressed as:

kafka-0.broker.kafka.svc.cluster.local:9092
kafka-1.broker.kafka.svc.cluster.local:9092 
kafka-2.broker.kafka.svc.cluster.local:9092

The constraint is that this external service be able to address the brokers specifically.

Whats the right (or one possible) way of going about this? Is it possible to expose a external service per kafka-x.broker.kafka.svc.cluster.local:9092?

Answer

Jason Kincl picture Jason Kincl · Sep 28, 2017

We have solved this in 1.7 by changing the headless service to Type=NodePort and setting the externalTrafficPolicy=Local. This bypasses the internal load balancing of a Service and traffic destined to a specific node on that node port will only work if a Kafka pod is on that node.

apiVersion: v1
kind: Service
metadata:
  name: broker
spec:
  externalTrafficPolicy: Local
  ports:
  - nodePort: 30000
    port: 30000
    protocol: TCP
    targetPort: 9092
  selector:
    app: broker
  type: NodePort

For example, we have two nodes nodeA and nodeB, nodeB is running a kafka pod. nodeA:30000 will not connect but nodeB:30000 will connect to the kafka pod running on nodeB.

https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-typenodeport

Note this was also available in 1.5 and 1.6 as a beta annotation, more can be found here on feature availability: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

Note also that while this ties a kafka pod to a specific external network identity, it does not guarantee that your storage volume will be tied to that network identity. If you are using the VolumeClaimTemplates in a StatefulSet then your volumes are tied to the pod while kafka expects the volume to be tied to the network identity.

For example, if the kafka-0 pod restarts and kafka-0 comes up on nodeC instead of nodeA, kafka-0's pvc (if using VolumeClaimTemplates) has data that it is for nodeA and the broker running on kafka-0 starts rejecting requests thinking that it is nodeA not nodeC.

To fix this, we are looking forward to Local Persistent Volumes but right now we have a single PVC for our kafka StatefulSet and data is stored under $NODENAME on that PVC to tie volume data to a particular node.

https://github.com/kubernetes/features/issues/121 https://kubernetes.io/docs/concepts/storage/volumes/#local