How do I fix a dial tcp 10.96.0.1:443: i/o timeout error for Operator pod installed via helm-rook?

CoderDude74 picture CoderDude74 · Feb 26, 2020 · Viewed 8.6k times · Source

I pretty much added the repo with this command

helm repo add rook-stable https://charts.rook.io/stable

Then I ran the command

helm install --namespace rook-ceph-system <NAME> <CHART VERSION>

The operator is created at first but then turns into a crashloopbackoff error.

Below is the log.

kubectl logs  rook-ceph-operator-5bdc9cfcb9-qml5n
2020-02-26 17:42:38.863455 I | rookcmd: starting Rook v0.9.3 with arguments '/usr/local/bin/rook ceph operator'
2020-02-26 17:42:38.863570 I | rookcmd: flag values: --alsologtostderr=false, --help=false, --log-level=INFO, --log_backtrace_at=:0, --log_dir=, --logtostderr=true, --mon-healthcheck-interval=45s, --mon-out-timeout=5m0s, --stderrthreshold=2, --v=0, --vmodule=
2020-02-26 17:42:39.056154 I | cephcmd: starting operator
failed to get pod. Get https://10.96.0.1:443/api/v1/namespaces/default/pods/rook-ceph-operator-5bdc9cfcb9-qml5n: dial tcp 10.96.0.1:443: i/o timeout

Any idea on how to fix this?

Answer

Peter Dev picture Peter Dev · Mar 17, 2020

Had the same problem with almost the same setup. Kubernetes cluster deployed with 3 VM (via vagrant). Calico as pod network.

Things I corrected : declare 3 VM hostnames in each /etc/hosts

192.168.100.51  kube1   kube1
192.168.100.52  kube2   kube2
192.168.100.53  kube3   kube3

Change pod-network-cidr :

kubeadm init --apiserver-advertise-address=192.168.100.51 --apiserver-cert-extra-sans=192.168.100.51 --node-name kube1 --pod-network-cidr=10.10.0.0/16

Use same pod-cidr in calico :

- name: CALICO_IPV4POOL_CIDR
  value: "10.10.0.0/16"

Rook deployement :

git clone --single-branch --branch release-1.2 https://github.com/rook/rook.git
cd cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl create -f cluster-test.yaml

Now Ceph cluster is up and running.