Kubernetes pod never gets ready

dilvan picture dilvan · Nov 12, 2015 · Viewed 14.8k times · Source

I am setting up a small Kubernetes cluster using a VM (master) and 3 bare metal servers (all running Ubuntu 14.04). I followed the Kubernetes install tutorial for Ubuntu. Each bare metal server also has 2T of disk space exported using Ceph 0.94.5. Everything is working fine, but when I try to start a Replication Controller I get the following (kubectl get pods):

NAME          READY     STATUS                                         RESTARTS   AGE
site2-zecnf   0/1       Image: site-img is ready, container is creating    0      12m

The pod will be in this Not Ready state forever, but, if I kill it and start it again, it will run fine (sometimes I have to repeat this operation a few times though). Once the pod is running, everything works just fine.

If, for some reason, the pod dies, it's restarted by Kubernetes, but can enter in this Not Ready state again. Running:

kubectl describe pod java-site2-crctv

I get (some fields deleted):

Namespace:          default
Status:             Pending
Replication Controllers:    java-site2 (1/1 replicas created)
Containers:
  java-site:
    Image:      javasite-img
    State:      Waiting
      Reason:       Image: javasite-img is ready, container is creating
    Ready:      False
    Restart Count:  0
Conditions:
  Type      Status
  Ready     False 
Events:
  FirstSeen             LastSeen            Count   From            SubobjectPath   Reason      Message
  Sat, 14 Nov 2015 12:37:56 -0200   Sat, 14 Nov 2015 12:37:56 -0200 1   {scheduler }                scheduled   Successfully assigned java-site2-crctv to 10.70.2.3
  Sat, 14 Nov 2015 12:37:57 -0200   Sat, 14 Nov 2015 12:45:29 -0200 46  {kubelet 10.70.2.3}         failedMount Unable to mount volumes for pod "java-site2-crctv_default": exit status 22
  Sat, 14 Nov 2015 12:37:57 -0200   Sat, 14 Nov 2015 12:45:29 -0200 46  {kubelet 10.70.2.3}         failedSync  Error syncing pod, skipping: exit status 22

The pod cannot mount the volume. But, if I mount the volumes (rdb blocks) by hand in a local folder in all nodes, the problem is gone (pods start without problems).

It seems to me that Kubernetes isn't able to map them (sudo rbd map java-site-vol), only to mount them (sudo mount /dev/rbd/rbd/java-site-vol /...).

Should I map all Ceph volumes that I use or should Kubernetes do that?

Answer

dilvan picture dilvan · Nov 16, 2015

I finally solved the problem. In the yaml files describing the Replication Controllers, I was using keyring: in the volume section:

keyring: "ceph.client.admin.keyring" 

After I generated a Ceph secret and changed the yaml files to use secretRef:

secretRef:
  name: "ceph-secret"

Kubernetes was able to map and mount the Ceph volumes and the pods began to start normally. I don't know why using keyring: doesn't work in this case.