As in this question, I set up a dumpdata
-based backup system for my database. The setup is akin to running a cron script that calls dumpdata
and moves the backup to a remote server, with the aim of simply using loaddata
to recover the database. However, I'm not sure this plays well with migrations. loaddata
now has an ignorenonexistent
switch to deal with deleted models/fields, but it is not able to resolve cases where columns were added with one-off defaults or apply RunPython
code.
The way I see it, there are two sub-problems to address:
dumpdata
output file with the current version of each appI'm stumped about how to tackle the first problem without introducing a ton of overhead. Would it be enough to save an extra file per backup that contained an {app_name: migration_number}
mapping?
The second problem I think is easier once the first one is solved, since the process is roughly:
loaddata
with the given fixture fileThere's some code in this question (linked from the bug report) that I think could be adapted for this purpose.
Since these are fairly regular/large snapshots of the database, I don't want to keep them as data migrations cluttering up the migrations directory.
I am taking the following steps to backup, restore or transfer my postgresql database between any instance of my project:
The idea is to keep the least possible migrations as if manage.py makemigrations
was run for the first time on an empty database.
Let's assume that we have a working database to our development environment. This database is a current copy of the production database that should not be open to any changes. We have added models, altered attributes etc and those actions have generated additional migrations.
Now the database is ready to be migrated to production which -as stated before- is not open to public so it is not altered in any way. In order to achieve this:
We make the changes in our development environment. No changes should happen in the production database because they will be overridden.
Before anything else, I have a backup of the project directory (which includes a requirements.txt file), a backup of the database and -of course- git
is a friend of mine.
I take a dumpdata
backup in case I need it. However, dumpdata
has some serious limitations regarding content types, permissions or other cases where a natural foreignkey should be used:
./manage.py dumpdata --exclude auth.permission --exclude contenttypes --exclude admin.LogEntry --exclude sessions --indent 2 > db.json
I take a pg_dump
backup to use:
pg_dump -U $user -Fc $database --exclude-table=django_migrations > path/to/backup-dir/db.dump
Only if I want to merge existing migrations in one, I delete all migrations from every application.
In my case the migrations
folder is a symlink, so I use the following script:
#!/bin/bash
for dir in $(find -L -name "migrations")
do
rm -Rf $dir/*
done
I delete and recreate the database:
For example, a bash script can include the following commands:
su -l postgres -c "PGPASSWORD=$password psql -c 'drop database $database ;'"
su -l postgres -c "createdb --owner $username $database"
su -l postgres -c "PGPASSWORD=$password psql $database -U $username -c 'CREATE EXTENSION $extension ;'"
I restore the database from the dump:
pg_restore -Fc -U $username -d $database path/to/backup-dir/db.dump
If migrations were deleted in step 3, I recreate them in the following way:
./manage.py makemigrations <app1> <app2> ... <appn>
... by using the following script:
#!/bin/bash
apps=()
for app in $(find ./ -maxdepth 1 -type d ! -path "./<project-folder> ! -path "./.*" ! -path "./")
do
apps+=(${app#??})
done
all_apps=$(printf "%s " "${apps[@]}")
./manage.py makemigrations $all_apps
I migrate using a fake migration:
./manage.py migrate --fake
In case something has gone completely wrong and everything is ***, (this can happen, indeed), I can use the backup to revert everything to its previous working state. If I would like to use the db.json
file from step one, it goes like this:
I perform the steps:
and then:
Apply the migrations:
./manage.py migrate
Load the data from db.json:
./manage.py loaddata path/to/db.json
Then I try to find out why my previous effort was not successful.
When the steps are performed successfully, I copy the project to the server and perform the same ones to that box.
This way, I always keep the least number of migrations and I am able to use pg_dump
and pg_restore
to any box that shares the same project.