How to backup a Postgres database in Kubernetes on Google Cloud?

MattMS picture MattMS · Feb 21, 2017 · Viewed 10.8k times · Source

What is the best practice for backing up a Postgres database running on Google Cloud Container Engine?

My thought is working towards storing the backups in Google Cloud Storage, but I am unsure of how to connect the Disk/Pod to a Storage Bucket.

I am running Postgres in a Kubernetes cluster using the following configuration:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: postgres-deployment
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          env:
            - name: PGDATA
              value: /var/lib/postgresql/data
            - name: POSTGRES_DB
              value: my-database-name
            - name: POSTGRES_PASSWORD
              value: my-password
            - name: POSTGRES_USER
              value: my-database-user
          name: postgres-container
          ports:
            - containerPort: 5432
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

I have attempted to create a Job to run a backup:

apiVersion: batch/v1
kind: Job
metadata:
  name: postgres-dump-job
spec:
  template:
    metadata:
      labels:
        app: postgres-dump
    spec:
      containers:
        - command:
            - pg_dump
            - my-database-name
          # `env` value matches `env` from previous configuration.
          image: postgres:9.6.2-alpine
          imagePullPolicy: IfNotPresent
          name: my-postgres-dump-container
          volumeMounts:
            - mountPath: /var/lib/postgresql
              name: my-postgres-volume
              readOnly: true
      restartPolicy: Never
      volumes:
        - gcePersistentDisk:
            fsType: ext4
            pdName: my-postgres-disk
          name: my-postgres-volume

(As far as I understand) this should run the pg_dump command and output the backup data to stdout (which should appear in the kubectl logs).

As an aside, when I inspect the Pods (with kubectl get pods), it shows the Pod never gets out of the "Pending" state, which I gather is due to there not being enough resources to start the Job.

Is it correct to run this process as a Job? How do I connect the Job to Google Cloud Storage? Or should I be doing something completely different?

I'm guessing it would be unwise to run pg_dump in the database Container (with kubectl exec) due to a performance hit, but maybe this is ok in a dev/staging server?

Answer

Marco Lamina picture Marco Lamina · Feb 23, 2017

I think running pg_dump as a job is a good idea, but connecting directly to your DB's persistent disk is not. Try having pg_dump connect to your DB over the network! You could then have a second disk onto which your pg_dump command dumps the backups. To be on the safe side, you can create regular snapshots of this second disk.