I have aks with one kubernetes cluster having 2 nodes. Each node has about 6-7 pod running with 2 containers for each pod. One container is my docker image and the other is created by istio for its service mesh. But after about 10 hours the nodes become 'not ready' and the node describe shows me 2 errors: 1.container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s. 2.rpc error: code = DeadlineExceeded desc = context deadline exceeded, Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
When I restart the node, it works fine but, the node goes back to 'NOT READY' after a while. Started facing this issue since adding in istio, but could not find any documents relating the two. Next step is to try and upgrade kubernetes
The node describe log:
Name: aks-agentpool-22124581-0
Roles: agent
Labels: agentpool=agentpool
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=Standard_B2s
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=eastus
failure-domain.beta.kubernetes.io/zone=1
kubernetes.azure.com/cluster=MC_XXXXXXXXX
kubernetes.io/hostname=aks-XXXXXXXXX
kubernetes.io/role=agent
node-role.kubernetes.io/agent=
storageprofile=managed
storagetier=Premium_LRS
Annotations: aks.microsoft.com/remediated=3
node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 25 Oct 2018 14:46:53 +0000
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Thu, 25 Oct 2018 14:49:06 +0000 Thu, 25 Oct 2018 14:49:06 +0000 RouteCreated RouteController created a route
OutOfDisk False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 19 Dec 2018 19:28:55 +0000 Thu, 25 Oct 2018 14:46:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready False Wed, 19 Dec 2018 19:28:55 +0000 Wed, 19 Dec 2018 19:27:24 +0000 KubeletNotReady container runtime is down,PLEG is not healthy: pleg was lastseen active 1h32m35.942907195s ago; threshold is 3m0s
Addresses:
Hostname: aks-XXXXXXXXX
Capacity:
cpu: 2
ephemeral-storage: 30428648Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 4040536Ki
pods: 110
Allocatable:
cpu: 1940m
ephemeral-storage: 28043041951
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3099480Ki
pods: 110
System Info:
Machine ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
System UUID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Boot ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Kernel Version: 4.15.0-1035-azure
OS Image: Ubuntu 16.04.5 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://Unknown
Kubelet Version: v1.11.3
Kube-Proxy Version: v1.11.3
PodCIDR: 10.244.0.0/24
ProviderID: azure:///subscriptions/9XXXXXXXXXXX/resourceGroups/MC_XXXXXXXXXXXXXXXXXXXXXXXXXXXX/providers/Microsoft.Compute/virtualMachines/aks-XXXXXXXXXXXX
Non-terminated Pods: (42 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default emailgistics-graph-monitor-6477568564-q98p2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-message-handler-7df4566b6f-mh255 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-reports-aggregator-5fd96b94cb-b5vbn 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-rules-844b77f46-5lrkw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-scheduler-754884b566-mwgvp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default emailgistics-subscription-token-manager-7974558985-f2t49 10m (0%) 0 (0%) 0 (0%) 0 (0%)
default mollified-kiwi-cert-manager-665c5d9c8c-2ld59 0 (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system grafana-59b787b9b-dzdtc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-citadel-5d8956cc6-x55vk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-egressgateway-f48fc7fbb-szpwp 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-galley-6975b6bd45-g7lsc 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-ingressgateway-c6c4bcdbf-bbgcw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-pilot-d9b5b9b7c-ln75n 510m (26%) 0 (0%) 2Gi (67%) 0 (0%)
istio-system istio-policy-6b465cd4bf-92l57 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-b2z85 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-j59r4 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-policy-6b465cd4bf-s9pdm 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-sidecar-injector-575597f5cf-npkcz 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-9794j 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-g7gh5 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-gd88n 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-px8qb 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-telemetry-6944cd768-xzslh 20m (1%) 0 (0%) 0 (0%) 0 (0%)
istio-system istio-tracing-7596597bd7-hjtq2 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system prometheus-76db5fddd5-d6dxs 10m (0%) 0 (0%) 0 (0%) 0 (0%)
istio-system servicegraph-758f96bf5b-c9sqk 10m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-default-http-backend-5ccb95zgfm8 10m (0%) 10m (0%) 20Mi (0%) 20Mi (0%)
kube-system addon-http-application-routing-external-dns-59d8698886-h8xds 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system addon-http-application-routing-nginx-ingress-controller-ff49qc7 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system heapster-5d6f9b846c-m4kfp 130m (6%) 130m (6%) 230Mi (7%) 230Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-qqkfm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-dns-v20-7c7d7d4c66-wrxjm 120m (6%) 0 (0%) 140Mi (4%) 220Mi (7%)
kube-system kube-proxy-2tb68 100m (5%) 0 (0%) 0 (0%) 0 (0%)
kube-system kube-svc-redirect-d6gqm 10m (0%) 0 (0%) 34Mi (1%) 0 (0%)
kube-system kubernetes-dashboard-68f468887f-l9x46 100m (5%) 100m (5%) 50Mi (1%) 300Mi (9%)
kube-system metrics-server-5cbc77f79f-x55cs 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system omsagent-mhrqm 50m (2%) 150m (7%) 150Mi (4%) 300Mi (9%)
kube-system omsagent-rs-d688cdf68-pjpmj 50m (2%) 150m (7%) 100Mi (3%) 500Mi (16%)
kube-system tiller-deploy-7f4974b9c8-flkjm 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system tunnelfront-7f766dd857-kgqps 10m (0%) 0 (0%) 64Mi (2%) 0 (0%)
kube-systems-dev nginx-ingress-dev-controller-7f78f6c8f9-csct4 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-systems-dev nginx-ingress-dev-default-backend-95fbc75b7-lq9tw 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1540m (79%) 540m (27%)
memory 2976Mi (98%) 1790Mi (59%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ContainerGCFailed 48m (x43 over 19h) kubelet, aks-agentpool-22124581-0 rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning ImageGCFailed 29m (x57 over 18h) kubelet, aks-agentpool-22124581-0 failed to get image stats: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Warning ContainerGCFailed 2m (x237 over 18h) kubelet, aks-agentpool-22124581-0 rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
General deployment file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
creationTimestamp: null
name: emailgistics-pod
spec:
minReadySeconds: 10
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
annotations:
sidecar.istio.io/status: '{"version":"ebf16d3ea0236e4b5cb4d3fc0f01da62e2e6265d005e58f8f6bd43a4fb672fdd","initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-certs"],"imagePullSecrets":null}'
creationTimestamp: null
labels:
app: emailgistics-pod
spec:
containers:
- image: xxxxxxxxxxxxxxxxxxxxx/emailgistics_pod:xxxxxx
imagePullPolicy: Always
name: emailgistics-pod
ports:
- containerPort: 80
resources: {}
- args:
- proxy
- sidecar
- --configPath
- /etc/istio/proxy
- --binaryPath
- /usr/local/bin/envoy
- --serviceCluster
- emailgistics-pod
- --drainDuration
- 45s
- --parentShutdownDuration
- 1m0s
- --discoveryAddress
- istio-pilot.istio-system:15005
- --discoveryRefreshDelay
- 1s
- --zipkinAddress
- zipkin.istio-system:9411
- --connectTimeout
- 10s
- --proxyAdminPort
- "15000"
- --controlPlaneAuthPolicy
- MUTUAL_TLS
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: ISTIO_META_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_METAJSON_LABELS
value: |
{"app":"emailgistics-pod"}
image: docker.io/istio/proxyv2:1.0.4
imagePullPolicy: IfNotPresent
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
resources:
requests:
cpu: 10m
securityContext:
readOnlyRootFilesystem: true
runAsUser: 1337
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/certs/
name: istio-certs
readOnly: true
imagePullSecrets:
- name: ga.secretname
initContainers:
- args:
- -p
- "15001"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- "80"
- -d
- ""
image: docker.io/istio/proxy_init:1.0.4
imagePullPolicy: IfNotPresent
name: istio-init
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
privileged: true
volumes:
- emptyDir:
medium: Memory
name: istio-envoy
- name: istio-certs
secret:
optional: true
secretName: istio.default
status: {}
---
Currently this is a known bug and no real fix has been created to normalize nodes behavior. Inspect below urls:
https://github.com/kubernetes/kubernetes/issues/45419
https://github.com/kubernetes/kubernetes/issues/61117
https://github.com/Azure/AKS/issues/102
Hope soon we will have a solution.