What is the best practice for backing up a Postgres database running on Google Cloud Container Engine?
My thought is working towards storing the backups in Google Cloud Storage, but I am unsure of how to connect the Disk/Pod to a Storage Bucket.
I am running Postgres in a Kubernetes cluster using the following configuration:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: postgres-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
containers:
- image: postgres:9.6.2-alpine
imagePullPolicy: IfNotPresent
env:
- name: PGDATA
value: /var/lib/postgresql/data
- name: POSTGRES_DB
value: my-database-name
- name: POSTGRES_PASSWORD
value: my-password
- name: POSTGRES_USER
value: my-database-user
name: postgres-container
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql
name: my-postgres-volume
volumes:
- gcePersistentDisk:
fsType: ext4
pdName: my-postgres-disk
name: my-postgres-volume
I have attempted to create a Job to run a backup:
apiVersion: batch/v1
kind: Job
metadata:
name: postgres-dump-job
spec:
template:
metadata:
labels:
app: postgres-dump
spec:
containers:
- command:
- pg_dump
- my-database-name
# `env` value matches `env` from previous configuration.
image: postgres:9.6.2-alpine
imagePullPolicy: IfNotPresent
name: my-postgres-dump-container
volumeMounts:
- mountPath: /var/lib/postgresql
name: my-postgres-volume
readOnly: true
restartPolicy: Never
volumes:
- gcePersistentDisk:
fsType: ext4
pdName: my-postgres-disk
name: my-postgres-volume
(As far as I understand) this should run the pg_dump
command and output the backup data to stdout (which should appear in the kubectl logs
).
As an aside, when I inspect the Pods (with kubectl get pods
), it shows the Pod never gets out of the "Pending" state, which I gather is due to there not being enough resources to start the Job.
Is it correct to run this process as a Job? How do I connect the Job to Google Cloud Storage? Or should I be doing something completely different?
I'm guessing it would be unwise to run pg_dump
in the database Container (with kubectl exec
) due to a performance hit, but maybe this is ok in a dev/staging server?
I think running pg_dump as a job is a good idea, but connecting directly to your DB's persistent disk is not. Try having pg_dump connect to your DB over the network! You could then have a second disk onto which your pg_dump command dumps the backups. To be on the safe side, you can create regular snapshots of this second disk.