How to reduce PostgreSQL database size?

RayCh picture RayCh · Apr 12, 2019 · Viewed 8.8k times · Source

I was planning to move from MySQL to PostgreSQL because I wanted to make use of TimescaleDB.

Everything looked good until I did a check on the storage size used by PostgreSQL (v11.2) compared to MySQL (v5.6). For exactly the same number of rows (1,440,000) and content:

  • MySQL: 156 MB
  • PostgreSQL: 246 MB
  • PostgreSQL + TimescaleDB (partitioned/chunked data): 324 MB

The MySQL and PostgreSQL numbers are like for like (i.e. including indexes and other constraints), the PostgreSQL + TimescaleDB has overhead of adding a timestamp to the table. The table concerned looks like this:

create table cell(
    cell_id            serial not null
   ,ts                 timestamp not null
   ,parent_id          int references parent( parent_id )
   ,instance_id        smallint
   ,v                  float
   ,a                  float
   ,t                  float
   ,s1                 float
   ,s2                 float
   ,s3                 float
   ,s4                 float
   ,s5                 float
   ,primary key( cell_id )
);
create index ix_cell_pid on cell( parent_id );
create index ix_cell_inst on cell( instance_id );

Why does PostgreSQL occupy so much more storage than MySQL?
And is there some way of significantly reducing it near to the MySQL level?

Answer

Erwin Brandstetter picture Erwin Brandstetter · Apr 12, 2019

Adding a timestamp column should add no more than 11 MB in your case (1440000 * 8 bytes, no added padding).

Have you run VACUUM FULL in Postgres before you measured the size, for a fair comparison? I suspect table and index bloat.

Related:

In MySQL the data type float is the single-precision floating point type occupying 4 bytes.

In Postgres the same float is the double-precision floating point type, occupying 8 bytes (alias: float8 or double precision).

That should explain another 44 MB of the difference. To compare apples with apples, create the Postgres table with 4-byte real columns (alias float4). Note the difference to MySQL, where real is used for 8-byte floating point numbers! Unfortunate disagreements.

The MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/storage-requirements.html
The Postgres manual: https://www.postgresql.org/docs/current/datatype-numeric.html

Related:

You show two indexes. Depending on what those are for, one multicolumn index might be able to replace both in Postgres - occupying just as much disk space as one of the two it replaces in this particular case (saving ~ 50 MB with the given specs).

create index ix_cell_pid on cell( parent_id, instance_id );

Consider: