Kubeadm and the Risks of Scheduling Pods on Master Node (Pods always Pending)

Flea picture Flea · Apr 6, 2018 · Viewed 15.7k times · Source

While following the kubernetes article on Using kubeadm to Create a Cluster, I was stuck when the AddOn pods I was trying to install (Nginx, Tiller, Grafana, InfluxDB, Dashboard) would always stay in a state of Pending.

Checking the message from kubectl describe pod tiller-deploy-df4fdf55d-jwtcz --namespace=kube-system resulted in the following message:

Type     Reason            Age                From               Message
----     ------            ----               ----               -------
Warning  FailedScheduling  51s (x15 over 3m)  default-scheduler  0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.

When I ran the command from the Master Isolation section kubectl taint nodes --all node-role.kubernetes.io/master-, the AddOns would install as expected.

At this point I can only suspect (because they are already installed on the master node) that the reason was that I hadn't connected a worker node to the cluster yet for the scheduler to schedule the pods on.

The documentation states "your cluster will not schedule pods on the master for security reasons". I know that this is a non-production environment so there is little risk in this situation but what is the risk of removing that taint in a production cluster?

Follow-up: If this is a risk, how can I re-add that taint so I can then uninstall the AddOn pods and try to have the scheduler install them on my Worker Node?

Environment Details: Operating System - CentOS 7.4.1708 (Core) Kubernetes Version - 1.10

Answer

mdaniel picture mdaniel · Apr 8, 2018

the reason was that I hadn't connected a worker node to the cluster yet for the scheduler to schedule the pods on.

100% correct. You will for sure want some worker nodes, otherwise the idea of "scheduling work" becomes very weird.

but what is the risk of removing that taint in a production cluster?

I am not a kubernetes security expert, but a pragmatic risk is CPU, I/O, and/or memory exhaustion on the master nodes, which would have very severe consequences to the health of the cluster. There is almost never a reason to run any workload on a master node, and almost entirely an increase in risk, so the advice "just don't do it" is well founded.

how can I re-add that taint so I can then uninstall the AddOn pods and try to have the scheduler install them on my Worker Node?

I'm not sure I follow that question, but I would for sure start by just adding a worker node before trying to do complicated stuff with taints and tolerations.