Kubectl rollout restart for statefulset

livinston picture livinston · Dec 4, 2019 · Viewed 21.8k times · Source

As per the kubectl docs, kubectl rollout restart is applicable for deployments, daemonsets and statefulsets. It works as expected for deployments. But for statefulsets, it restarts only one pod of the 2 pods.

✗ k rollout restart statefulset alertmanager-main                       (playground-fdp/monitoring)
statefulset.apps/alertmanager-main restarted

✗ k rollout status statefulset alertmanager-main                        (playground-fdp/monitoring)
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
statefulset rolling update complete 2 pods at revision alertmanager-main-59d7ccf598...

✗ kgp -l app=alertmanager                                               (playground-fdp/monitoring)
NAME                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0   2/2     Running   0          21h
alertmanager-main-1   2/2     Running   0          20s

As you can see the pod alertmanager-main-1 has been restarted and its age is 20s. Whereas the other pod in the statefulset alertmanager, i.e., pod alertmanager-main-0 has not been restarted and it is age is 21h. Any idea how we can restart a statefulset after some configmap used by it has been updated?

[Update 1] Here is the statefulset configuration. As you can see the .spec.updateStrategy.rollingUpdate.partition is not set.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"monitoring.coreos.com/v1","kind":"Alertmanager","metadata":{"annotations":{},"labels":{"alertmanager":"main"},"name":"main","namespace":"monitoring"},"spec":{"baseImage":"10.47.2.76:80/alm/alertmanager","nodeSelector":{"kubernetes.io/os":"linux"},"replicas":2,"securityContext":{"fsGroup":2000,"runAsNonRoot":true,"runAsUser":1000},"serviceAccountName":"alertmanager-main","version":"v0.19.0"}}
  creationTimestamp: "2019-12-02T07:17:49Z"
  generation: 4
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
  ownerReferences:
  - apiVersion: monitoring.coreos.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: Alertmanager
    name: main
    uid: 3e3bd062-6077-468e-ac51-909b0bce1c32
  resourceVersion: "521307"
  selfLink: /apis/apps/v1/namespaces/monitoring/statefulsets/alertmanager-main
  uid: ed4765bf-395f-4d91-8ec0-4ae23c812a42
spec:
  podManagementPolicy: Parallel
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      alertmanager: main
      app: alertmanager
  serviceName: alertmanager-operated
  template:
    metadata:
      creationTimestamp: null
      labels:
        alertmanager: main
        app: alertmanager
    spec:
      containers:
      - args:
        - --config.file=/etc/alertmanager/config/alertmanager.yaml
        - --cluster.listen-address=[$(POD_IP)]:9094
        - --storage.path=/alertmanager
        - --data.retention=120h
        - --web.listen-address=:9093
        - --web.external-url=http://10.47.0.234
        - --web.route-prefix=/
        - --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:9094
        - --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:9094
        env:
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: 10.47.2.76:80/alm/alertmanager:v0.19.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/healthy
            port: web
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: alertmanager
        ports:
        - containerPort: 9093
          name: web
          protocol: TCP
        - containerPort: 9094
          name: mesh-tcp
          protocol: TCP
        - containerPort: 9094
          name: mesh-udp
          protocol: UDP
        readinessProbe:
          failureThreshold: 10
          httpGet:
            path: /-/ready
            port: web
            scheme: HTTP
          initialDelaySeconds: 3
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          requests:
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
        - mountPath: /alertmanager
          name: alertmanager-main-db
      - args:
        - -webhook-url=http://localhost:9093/-/reload
        - -volume-dir=/etc/alertmanager/config
        image: 10.47.2.76:80/alm/configmap-reload:v0.0.1
        imagePullPolicy: IfNotPresent
        name: config-reloader
        resources:
          limits:
            cpu: 100m
            memory: 25Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/alertmanager/config
          name: config-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/os: linux
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 2000
        runAsNonRoot: true
        runAsUser: 1000
      serviceAccount: alertmanager-main
      serviceAccountName: alertmanager-main
      terminationGracePeriodSeconds: 120
      volumes:
      - name: config-volume
        secret:
          defaultMode: 420
          secretName: alertmanager-main
      - emptyDir: {}
        name: alertmanager-main-db
  updateStrategy:
    type: RollingUpdate
status:
  collisionCount: 0
  currentReplicas: 2
  currentRevision: alertmanager-main-59d7ccf598
  observedGeneration: 4
  readyReplicas: 2
  replicas: 2
  updateRevision: alertmanager-main-59d7ccf598
  updatedReplicas: 2

Answer

PjoterS picture PjoterS · Dec 11, 2019

You did not provide whole scenario. It might depends on Readiness Probe or Update Strategy.

StatefulSet restart pods from index 0 to n-1. Details can be found here.

Reason 1*

Statefulset have 4 update strategies.

  • On Delete
  • Rolling Updates
  • Partitions
  • Forced Rollback

In Partition update you can find information that:

If a partition is specified, all Pods with an ordinal that is greater than or equal to the partition will be updated when the StatefulSet’s .spec.template is updated. All Pods with an ordinal that is less than the partition will not be updated, and, even if they are deleted, they will be recreated at the previous version. If a StatefulSet’s .spec.updateStrategy.rollingUpdate.partition is greater than its .spec.replicas, updates to its .spec.template will not be propagated to its Pods. In most cases you will not need to use a partition, but they are useful if you want to stage an update, roll out a canary, or perform a phased roll out.

So if somewhere in StatefulSet you have set updateStrategy.rollingUpdate.partition: 1 it will restart all pods with index 1 or higher.

Example of partition: 3

NAME    READY   STATUS    RESTARTS   AGE
web-0   1/1     Running   0          30m
web-1   1/1     Running   0          30m
web-2   1/1     Running   0          31m
web-3   1/1     Running   0          2m45s
web-4   1/1     Running   0          3m
web-5   1/1     Running   0          3m13s

Reason 2

Configuration of Readiness probe.

If your values of initialDelaySeconds and periodSeconds are high, it might take a while before another one will be restarted. Details about those parameters can be found here.

In below example, pod will wait 10 seconds it will be running, and readiness probe is checking this each 2 seconds. Depends on values it might be cause of this behavior.

    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /
        port: 80
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 2
      successThreshold: 1
      timeoutSeconds: 1

Reason 3

I saw that you have 2 containers in each pod.

NAME                  READY   STATUS    RESTARTS   AGE
alertmanager-main-0   2/2     Running   0          21h
alertmanager-main-1   2/2     Running   0          20s

As describe in docs:

Running - The Pod has been bound to a node, and all of the Containers have been created. At least one Container is still running, or is in the process of starting or restarting.

It would be good to check if everything is ok with both containers (readinessProbe/livenessProbe, restarts etc.)