Postgres pg_dump dumps database in a different order every time

littleK picture littleK · Feb 1, 2010 · Viewed 13.4k times · Source

I am writing a PHP script (which also uses linux bash commands) which will run through test cases by doing the following:

I am using a PostgreSQL database (8.4.2)...

1.) Create a DB 2.) Modify the DB 3.) Store a database dump of the DB (pg_dump)

4.) Do regression testing by doing steps 1.) and 2.), and then take another database dump and compare it (diff) with the original database dump from step number 3.)

However, I am finding that pg_dump will not always dump the database in the same way. It will dump things in a different order every time. Therefore, when I do a diff on the two database dumps, the comparison will result in the two files being different, when they are actually the same, just in a different order.

Is there a different way I can go about doing the pg_dump?

Thanks!

Answer

akaihola picture akaihola · Apr 22, 2010

Here is a handy script for pre-processing pg_dump output to make it more suitable for diffing and storing in version control:

https://github.com/akaihola/pgtricks

pg_dump_splitsort.py splits the dump into the following files:

  • 0000_prologue.sql: everything up to the first COPY
  • 0001_<schema>.<table>.sql
    .
    .
    NNNN_<schema>.<table>.sql: data for each table sorted by the first field
  • 9999_epilogue.sql: everything after the last COPY

The files for table data are numbered so a simple sorted concatenation of all files can be used to re-create the database:

$ cat *.sql | psql <database>

I've found that a good way to take a quick look at differences between dumps is to use the meld tool on the whole directory:

$ meld old-dump/ new-dump/

Storing the dump in version control also gives a decent view on the differences. Here's how to configure git to use color in diffs:

# ~/.gitconfig
[color]
        diff = true
[color "diff"]
        frag = white blue bold
        meta = white green bold
        commit = white red bold

Note: If you have created/dropped/renamed tables, remember to delete all .sql files before post-processing the new dump.