kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

Question 1

kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

docker kubernetes kube-dns

mootez · Mar 1, 2017 · Viewed 7.9k times · Source

Answer

Answer

According to the error you posted, kubedns can not communicate with the API server:

dial tcp 10.233.0.1:443: i/o timeout

This can mean three things:

Your network fabric for containers is not configured properly

Look for errors in the logs of the network solution you're using
Make sure every Docker deamon is using its own IP range
Verify that the container network does not overlap with the host network

You have a problem with your kube-proxy and the network traffic is not forwarded to the API server when using the kubernetes internal Service (10.233.0.1)

Check the kube-proxy logs on your nodes (kubeminion{1,2}) and update your question with any error you may find

If you are also seeing authentication errors:

kube-controller-manager does not produce valid Service Account tokens

Check that the --service-account-private-key-file and --root-ca-file flags of kube-controller-manager are set to a valid key/cert and restart the service
Delete the default-token-xxxx secret in the kube-system namespace and recreate the kube-dns Deployment

Question 2

After deploying the kubernetes cluster using kargo, I found out that kubedns pod is not working properly:

$ kcsys get pods -o wide

NAME          READY STATUS           RESTARTS AGE  IP           NODE
dnsmasq-alv8k 1/1   Running          2        1d   10.233.86.2  kubemaster
dnsmasq-c9y52 1/1   Running          2        1d   10.233.82.2  kubeminion1
dnsmasq-sjouh 1/1   Running          2        1d   10.233.76.6  kubeminion2
kubedns-hxaj7 2/3   CrashLoopBackOff 339      22h  10.233.76.3  kubeminion2

PS : kcsys is an alias of kubectl --namespace=kube-system

Logs for each container (kubedns, dnsmasq) seems OK except healthz container as following:

2017/03/01 07:24:32 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local' error exit status 1

Update

kubedns rc description

apiVersion: v1
kind: ReplicationController
metadata:
  creationTimestamp: 2017-02-28T08:31:57Z
  generation: 1
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    version: v19
  name: kubedns
  namespace: kube-system
  resourceVersion: "130982"
  selfLink: /api/v1/namespaces/kube-system/replicationcontrollers/kubedns
  uid: 5dc9f9f2-fd90-11e6-850d-005056a020b4
spec:
  replicas: 1
  selector:
    k8s-app: kubedns
    version: v19
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: kubedns
        kubernetes.io/cluster-service: "true"
        version: v19
    spec:
      containers:
      - args:
        - --domain=cluster.local.
        - --dns-port=10053
        - --v=2
        image: gcr.io/google_containers/kubedns-amd64:1.9
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /healthz
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kubedns
        ports:
        - containerPort: 10053
          name: dns-local
          protocol: UDP
        - containerPort: 10053
          name: dns-tcp-local
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readiness
            port: 8081
            scheme: HTTP          
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - --log-facility=-
        - --cache-size=1000
        - --no-resolv
        - --server=127.0.0.1#10053
        image: gcr.io/google_containers/kube-dnsmasq-amd64:1.3
        imagePullPolicy: IfNotPresent
        name: dnsmasq
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 170Mi
          requests:
            cpu: 70m
            memory: 70Mi
        terminationMessagePath: /dev/termination-log
      - args:
        - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
          && nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
        - -port=8080
        - -quiet
        image: gcr.io/google_containers/exechealthz-amd64:1.1
        imagePullPolicy: IfNotPresent
        name: healthz
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: 10m
            memory: 50Mi
          requests:
            cpu: 10m
            memory: 50Mi
        terminationMessagePath: /dev/termination-log
      dnsPolicy: Default
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  fullyLabeledReplicas: 1
  observedGeneration: 1
  replicas: 1`

kubedns svc description:

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2017-02-28T08:31:58Z
  labels:
    k8s-app: kubedns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: kubedns
  name: kubedns
  namespace: kube-system
  resourceVersion: "10736"
  selfLink: /api/v1/namespaces/kube-system/services/kubedns
  uid: 5ed4dd78-fd90-11e6-850d-005056a020b4
spec:
  clusterIP: 10.233.0.3
  ports:
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  selector:
    k8s-app: kubedns
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

I catch some errors in kubedns container:

1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout
1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://10.233.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.233.0.1:443: i/o timeout

UPDATE 2

iptables rules created by kube-proxy when creating hostnames service with 3 pods:

flags of controller-manager pod:
pods status

kube-dns can not resolve 'kubernetes.default.svc.cluster.local'

UPDATE 2

Answer

Related questions