How to debug "ImagePullBackOff"?

Xiao Peng - ZenUML.com picture Xiao Peng - ZenUML.com · Jan 18, 2016 · Viewed 144.8k times · Source

All of a sudden, I cannot deploy some images which could be deployed before. I got the following pod status:

[root@webdev2 origin]# oc get pods 
NAME                      READY     STATUS             RESTARTS   AGE 
arix-3-yjq9w              0/1       ImagePullBackOff   0          10m 
docker-registry-2-vqstm   1/1       Running            0          2d 
router-1-kvjxq            1/1       Running            0          2d 

The application just won't start. The pod is not trying to run the container. From the Event page, I have got Back-off pulling image "172.30.84.25:5000/default/arix@sha256:d326. I have verified that I can pull the image with the tag with docker pull.

I have also checked the log of the last container. It was closed for some reason. I think the pod should at least try to restart it.

I have run out of ideas to debug the issues. What can I check more?

Answer

rjdkolb picture rjdkolb · May 24, 2017

You can use the 'describe pod' syntax

For OpenShift use:

oc describe pod <pod-id>  

For vanilla Kubernetes:

kubectl describe pod <pod-id>  

Examine the events of the output. In my case it shows Back-off pulling image coredns/coredns:latest

In this case the image coredns/coredns:latest can not be pulled from the Internet.

Events:
  FirstSeen LastSeen    Count   From                SubObjectPath           Type        Reason      Message
  --------- --------    -----   ----                -------------           --------    ------      -------
  5m        5m      1   {default-scheduler }                        Normal      Scheduled   Successfully assigned coredns-4224169331-9nhxj to 192.168.122.190
  5m        1m      4   {kubelet 192.168.122.190}   spec.containers{coredns}    Normal      Pulling     pulling image "coredns/coredns:latest"
  4m        26s     4   {kubelet 192.168.122.190}   spec.containers{coredns}    Warning     Failed      Failed to pull image "coredns/coredns:latest": Network timed out while trying to connect to https://index.docker.io/v1/repositories/coredns/coredns/images. You may want to check your internet connection or if you are behind a proxy.
  4m        26s     4   {kubelet 192.168.122.190}                   Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "coredns" with ErrImagePull: "Network timed out while trying to connect to https://index.docker.io/v1/repositories/coredns/coredns/images. You may want to check your Internet connection or if you are behind a proxy."

  4m    2s  7   {kubelet 192.168.122.190}   spec.containers{coredns}    Normal  BackOff     Back-off pulling image "coredns/coredns:latest"
  4m    2s  7   {kubelet 192.168.122.190}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "coredns" with ImagePullBackOff: "Back-off pulling image \"coredns/coredns:latest\""

Additional debuging steps

  1. try to pull the docker image and tag manually on your computer
  2. Identify the node by doing a 'kubectl/oc get pods -o wide'
  3. ssh into the node (if you can) that can not pull the docker image
  4. check that the node can resolve the DNS of the docker registry by performing a ping.
  5. try to pull the docker image manually on the node
  6. If you are using a private registry, check that your secret exists and the secret is correct. Your secret should also be in the same namespace. Thanks swenzel
  7. Some registries have firewalls that limit ip address access. The firewall may block the pull
  8. Some CIs create deployments with temporary docker secrets. So the secret expires after a few days (You are asking for production failures...)