Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage

Nikhil picture Nikhil · Jan 12, 2019 · Viewed 9k times · Source

I am using Kubernetes v1.13.0. My master is also functioning as a worker-node, so it has workload pods running on it, apart from control plane pods.

The kubelet logs on my master show the following lines:

eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:358] eviction manager: pods ranked for eviction: kube-controller-manager-vm2_kube-system(1631c2c238e0c5117acac446b26d9f8c), kube-apiserver-vm2_kube-system(ce43eba098d219e13901c4a0b829f43b), etcd-vm2_kube-system(91ab2b0ddf4484a5ac6ee9661dbd0b1c)

Once the kube-apiserver pod is evicted, the cluster becomes unusable.

What can I do to fix this? Should I add more ephemeral storage? How would I go about doing that? That means adding more space to the root partition on my host?

My understanding is that ephemeral storage consists of /var/log and /var/lib/kubelet folders, which both come under the root partition.

A df -h on my host shows:

Filesystem                               Size  Used Avail Use% Mounted on
/dev/vda1                                 39G   33G  6.2G  85% /

So it looks like the root partition has lot of memory left, and there is no disk pressure. So what is causing this issue? Some of my worker pods must be doing something crazy with storage, but it's still 6G seems like plenty of room.

Will adding more space to the root partition fix this issue temporarily?

kubectl describe vm2 gives the following info:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Fri, 11 Jan 2019 21:25:43 +0000   Wed, 05 Dec 2018 19:16:41 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Fri, 11 Jan 2019 21:25:43 +0000   Fri, 11 Jan 2019 20:58:07 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Fri, 11 Jan 2019 21:25:43 +0000   Wed, 05 Dec 2018 19:16:41 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Fri, 11 Jan 2019 21:25:43 +0000   Thu, 06 Dec 2018 17:00:02 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Capacity:
 cpu:                8
 ephemeral-storage:  40593708Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             32946816Ki
 pods:               110
Allocatable:
 cpu:                8
 ephemeral-storage:  37411161231
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             32844416Ki
 pods:               110

It seems to me that there was pressure on ephemeral-storage, and the eviction manager is trying to reclaim some storage by evicting least recently used pods. But it should not evict the control plane pods, otherwise cluster is unusable.

Currently, the Kubelet evicts the control plane pods. Then I try to manually start the apiserver and other control plane pods by adding and removing a space in the /etc/kubernetes/manifests files. This does start the apiserver, but then it again gets evicted. Ideally, the Kubelet should ensure that the static pods in /etc/kubernetes/manifests are always on and properly managed.

I am trying to understand what is going on here, and how to fix this issue, so that my kubernetes cluster becomes more robust, and I don't have to keep manually restarting the apiserver.

Answer

Matt Nicolls picture Matt Nicolls · Jan 30, 2019

I had this same problem and solved it by changing the threshold for evictionHard.

Looking at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have:

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

so I see my config file for kubelet is /var/lib/kubelet/config.yaml

Opening that I changed evitionHard settings to be (I think they were 10 or 15% before):

...
evictionHard:
  imagefs.available: 1%
  memory.available: 100Mi
  nodefs.available: 1%
  nodefs.inodesFree: 1%
...

There is also the --experimental-allocatable-ignore-eviction (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) setting which should completely disable eviction.