I am writing a PHP script (which also uses linux bash commands) which will run through test cases by doing the following:
I am using a PostgreSQL database (8.4.2)...
1.) Create a DB 2.) Modify the DB 3.) Store a database dump of the DB (pg_dump)
4.) Do regression testing by doing steps 1.) and 2.), and then take another database dump and compare it (diff) with the original database dump from step number 3.)
However, I am finding that pg_dump will not always dump the database in the same way. It will dump things in a different order every time. Therefore, when I do a diff on the two database dumps, the comparison will result in the two files being different, when they are actually the same, just in a different order.
Is there a different way I can go about doing the pg_dump?
Thanks!
Here is a handy script for pre-processing pg_dump
output to make it more suitable for diffing and storing in version control:
https://github.com/akaihola/pgtricks
pg_dump_splitsort.py
splits the dump into the following files:
0000_prologue.sql
: everything up to the first COPY0001_<schema>.<table>.sql
NNNN_<schema>.<table>.sql
: data for each table sorted by the first field9999_epilogue.sql
: everything after the last COPYThe files for table data are numbered so a simple sorted concatenation of all files can be used to re-create the database:
$ cat *.sql | psql <database>
I've found that a good way to take a quick look at differences between dumps is to use the meld
tool on the whole directory:
$ meld old-dump/ new-dump/
Storing the dump in version control also gives a decent view on the differences. Here's how to configure git to use color in diffs:
# ~/.gitconfig
[color]
diff = true
[color "diff"]
frag = white blue bold
meta = white green bold
commit = white red bold
Note: If you have created/dropped/renamed tables, remember to delete all .sql
files before post-processing the new dump.