How to fix 'container runtime is down,PLEG is not healthy'

Ask picture Ask · Dec 20, 2018 · Viewed 11.9k times · Source

I have aks with one kubernetes cluster having 2 nodes. Each node has about 6-7 pod running with 2 containers for each pod. One container is my docker image and the other is created by istio for its service mesh. But after about 10 hours the nodes become 'not ready' and the node describe shows me 2 errors: 1.container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s. 2.rpc error: code = DeadlineExceeded desc = context deadline exceeded, Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

When I restart the node, it works fine but, the node goes back to 'NOT READY' after a while. Started facing this issue since adding in istio, but could not find any documents relating the two. Next step is to try and upgrade kubernetes

The node describe log:

Name:               aks-agentpool-22124581-0
Roles:              agent
Labels:             agentpool=agentpool
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_B2s
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eastus
                    failure-domain.beta.kubernetes.io/zone=1
                    kubernetes.azure.com/cluster=MC_XXXXXXXXX
                    kubernetes.io/hostname=aks-XXXXXXXXX
                    kubernetes.io/role=agent
                    node-role.kubernetes.io/agent=
                    storageprofile=managed
                    storagetier=Premium_LRS
Annotations:        aks.microsoft.com/remediated=3
                    node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Thu, 25 Oct 2018 14:46:53 +0000
Taints:             <none>
Unschedulable:      false
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Thu, 25 Oct 2018 14:49:06 +0000   Thu, 25 Oct 2018 14:49:06 +0000   RouteCreated                 RouteController created a route
  OutOfDisk            False   Wed, 19 Dec 2018 19:28:55 +0000   Wed, 19 Dec 2018 19:27:24 +0000   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure       False   Wed, 19 Dec 2018 19:28:55 +0000   Wed, 19 Dec 2018 19:27:24 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Wed, 19 Dec 2018 19:28:55 +0000   Wed, 19 Dec 2018 19:27:24 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Wed, 19 Dec 2018 19:28:55 +0000   Thu, 25 Oct 2018 14:46:53 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                False   Wed, 19 Dec 2018 19:28:55 +0000   Wed, 19 Dec 2018 19:27:24 +0000   KubeletNotReady              container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s
Addresses:
  Hostname:  aks-XXXXXXXXX
Capacity:
 cpu:                2
 ephemeral-storage:  30428648Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             4040536Ki
 pods:               110
Allocatable:
 cpu:                1940m
 ephemeral-storage:  28043041951
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3099480Ki
 pods:               110
System Info:
 Machine ID:                 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 System UUID:                XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 Boot ID:                    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 Kernel Version:             4.15.0-1035-azure
 OS Image:                   Ubuntu 16.04.5 LTS
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://Unknown
 Kubelet Version:            v1.11.3
 Kube-Proxy Version:         v1.11.3
PodCIDR:                     10.244.0.0/24
ProviderID:                  azure:///subscriptions/9XXXXXXXXXXX/resourceGroups/MC_XXXXXXXXXXXXXXXXXXXXXXXXXXXX/providers/Microsoft.Compute/virtualMachines/aks-XXXXXXXXXXXX
Non-terminated Pods:         (42 in total)
  Namespace                  Name                                                               CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                                                               ------------  ----------  ---------------  -------------
  default                    emailgistics-graph-monitor-6477568564-q98p2                        10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    emailgistics-message-handler-7df4566b6f-mh255                      10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    emailgistics-reports-aggregator-5fd96b94cb-b5vbn                   10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    emailgistics-rules-844b77f46-5lrkw                                 10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    emailgistics-scheduler-754884b566-mwgvp                            10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    emailgistics-subscription-token-manager-7974558985-f2t49           10m (0%)      0 (0%)      0 (0%)           0 (0%)
  default                    mollified-kiwi-cert-manager-665c5d9c8c-2ld59                       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  istio-system               grafana-59b787b9b-dzdtc                                            10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-citadel-5d8956cc6-x55vk                                      10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-egressgateway-f48fc7fbb-szpwp                                10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-galley-6975b6bd45-g7lsc                                      10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-ingressgateway-c6c4bcdbf-bbgcw                               10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-pilot-d9b5b9b7c-ln75n                                        510m (26%)    0 (0%)      2Gi (67%)        0 (0%)
  istio-system               istio-policy-6b465cd4bf-92l57                                      20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-policy-6b465cd4bf-b2z85                                      20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-policy-6b465cd4bf-j59r4                                      20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-policy-6b465cd4bf-s9pdm                                      20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-sidecar-injector-575597f5cf-npkcz                            10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-telemetry-6944cd768-9794j                                    20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-telemetry-6944cd768-g7gh5                                    20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-telemetry-6944cd768-gd88n                                    20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-telemetry-6944cd768-px8qb                                    20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-telemetry-6944cd768-xzslh                                    20m (1%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               istio-tracing-7596597bd7-hjtq2                                     10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               prometheus-76db5fddd5-d6dxs                                        10m (0%)      0 (0%)      0 (0%)           0 (0%)
  istio-system               servicegraph-758f96bf5b-c9sqk                                      10m (0%)      0 (0%)      0 (0%)           0 (0%)
  kube-system                addon-http-application-routing-default-http-backend-5ccb95zgfm8    10m (0%)      10m (0%)    20Mi (0%)        20Mi (0%)
  kube-system                addon-http-application-routing-external-dns-59d8698886-h8xds       0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                addon-http-application-routing-nginx-ingress-controller-ff49qc7    0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                heapster-5d6f9b846c-m4kfp                                          130m (6%)     130m (6%)   230Mi (7%)       230Mi (7%)
  kube-system                kube-dns-v20-7c7d7d4c66-qqkfm                                      120m (6%)     0 (0%)      140Mi (4%)       220Mi (7%)
  kube-system                kube-dns-v20-7c7d7d4c66-wrxjm                                      120m (6%)     0 (0%)      140Mi (4%)       220Mi (7%)
  kube-system                kube-proxy-2tb68                                                   100m (5%)     0 (0%)      0 (0%)           0 (0%)
  kube-system                kube-svc-redirect-d6gqm                                            10m (0%)      0 (0%)      34Mi (1%)        0 (0%)
  kube-system                kubernetes-dashboard-68f468887f-l9x46                              100m (5%)     100m (5%)   50Mi (1%)        300Mi (9%)
  kube-system                metrics-server-5cbc77f79f-x55cs                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                omsagent-mhrqm                                                     50m (2%)      150m (7%)   150Mi (4%)       300Mi (9%)
  kube-system                omsagent-rs-d688cdf68-pjpmj                                        50m (2%)      150m (7%)   100Mi (3%)       500Mi (16%)
  kube-system                tiller-deploy-7f4974b9c8-flkjm                                     0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                tunnelfront-7f766dd857-kgqps                                       10m (0%)      0 (0%)      64Mi (2%)        0 (0%)
  kube-systems-dev           nginx-ingress-dev-controller-7f78f6c8f9-csct4                      0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-systems-dev           nginx-ingress-dev-default-backend-95fbc75b7-lq9tw                  0 (0%)        0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource  Requests      Limits
  --------  --------      ------
  cpu       1540m (79%)   540m (27%)
  memory    2976Mi (98%)  1790Mi (59%)
Events:
  Type     Reason             Age                 From                               Message
  ----     ------             ----                ----                               -------
  Warning  ContainerGCFailed  48m (x43 over 19h)  kubelet, aks-agentpool-22124581-0  rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  ImageGCFailed      29m (x57 over 18h)  kubelet, aks-agentpool-22124581-0  failed to get image stats: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
  Warning  ContainerGCFailed  2m (x237 over 18h)  kubelet, aks-agentpool-22124581-0  rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

General deployment file:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: emailgistics-pod
spec:
  minReadySeconds: 10
  replicas: 1
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        sidecar.istio.io/status: '{"version":"ebf16d3ea0236e4b5cb4d3fc0f01da62e2e6265d005e58f8f6bd43a4fb672fdd","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
      creationTimestamp: null
      labels:
        app: emailgistics-pod
    spec:
      containers:
      - image: xxxxxxxxxxxxxxxxxxxxx/emailgistics_pod:xxxxxx
        imagePullPolicy: Always
        name: emailgistics-pod
        ports:
        - containerPort: 80
        resources: {}
      - args:
        - proxy
        - sidecar
        - --configPath
        - /etc/istio/proxy
        - --binaryPath
        - /usr/local/bin/envoy
        - --serviceCluster
        - emailgistics-pod
        - --drainDuration
        - 45s
        - --parentShutdownDuration
        - 1m0s
        - --discoveryAddress
        - istio-pilot.istio-system:15005
        - --discoveryRefreshDelay
        - 1s
        - --zipkinAddress
        - zipkin.istio-system:9411
        - --connectTimeout
        - 10s
        - --proxyAdminPort
        - "15000"
        - --controlPlaneAuthPolicy
        - MUTUAL_TLS
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: ISTIO_META_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ISTIO_META_INTERCEPTION_MODE
          value: REDIRECT
        - name: ISTIO_METAJSON_LABELS
          value: |
            {"app":"emailgistics-pod"}
        image: docker.io/istio/proxyv2:1.0.4
        imagePullPolicy: IfNotPresent
        name: istio-proxy
        ports:
        - containerPort: 15090
          name: http-envoy-prom
          protocol: TCP
        resources:
          requests:
            cpu: 10m
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 1337
        volumeMounts:
        - mountPath: /etc/istio/proxy
          name: istio-envoy
        - mountPath: /etc/certs/
          name: istio-certs
          readOnly: true
      imagePullSecrets:
      - name: ga.secretname
      initContainers:
      - args:
        - -p
        - "15001"
        - -u
        - "1337"
        - -m
        - REDIRECT
        - -i
        - '*'
        - -x
        - ""
        - -b
        - "80"
        - -d
        - ""
        image: docker.io/istio/proxy_init:1.0.4
        imagePullPolicy: IfNotPresent
        name: istio-init
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
          privileged: true
      volumes:
      - emptyDir:
          medium: Memory
        name: istio-envoy
      - name: istio-certs
        secret:
          optional: true
          secretName: istio.default
status: {}
---

Answer

Vitalii picture Vitalii · Dec 21, 2018

Currently this is a known bug and no real fix has been created to normalize nodes behavior. Inspect below urls:

https://github.com/kubernetes/kubernetes/issues/45419

https://github.com/kubernetes/kubernetes/issues/61117

https://github.com/Azure/AKS/issues/102

Hope soon we will have a solution.