Which Postgresql WAL files can I safely remove from the WAL archive folder

undercurrent picture undercurrent · Feb 2, 2016 · Viewed 31.7k times · Source

Current situation

So I have WAL archiving set up to an independent internal harddrive on a data logging computer running Postgres. The harddrive containing the WAL archives is filling up and I'd like to remove and archive all the WAL archive files, including the initial base backup, to external backup drives.

The directory structure is like:

D:/WALBACKUP/ which is the parent folder for all the WAL files (00000110000.CA00000004 etc)

D:/WALBACKUP/BASEBACKUP/ which holds the .tar of the initial base backup

The question I have then is:

  • Can I safely move literally every single WAL file except the current WAL archive file, (000000000001.CA0000.. and so on), including the base backup, and move them to another hdd. (Note that the database is live and receiving data)

cheers!

Answer

Craig Ringer picture Craig Ringer · Feb 2, 2016

WAL archives

You can use the pg_archivecleanup command to remove WAL from an archive (not pg_xlog) that's not required by a given base backup.

In general I suggest using PgBarman or a similar tool to automate your base backups and WAL retention though. It's easier and less error prone.

pg_xlog

Never remove WAL from pg_xlog manually. If you have too much WAL then:

  • your wal_keep_segments setting is keeping WAL around;
  • you have archive_mode on and archive_command set but it isn't working correctly (check the logs);
  • your checkpoint_segments is ridiculously high so you're just generating too much WAL; or
  • you have a replication slot (see the pg_replication_slots view) that's preventing the removal of WAL.

You should fix the problem that's causing WAL to be retained. If nothing seems to have happened after changing a setting run a manual CHECKPOINT command.

If you have an offline server and need to remove WAL to start it you can use pg_archivecleanup if you must. It knows how to remove only WAL that isn't needed by the server its self ... but it might break your archive-based backups, streaming replicas, etc. So don't use it unless you must.