How do I know if the statistics of a Postgres table are up to date?

Beibei picture Beibei · Aug 1, 2011 · Viewed 33k times · Source

In pgAdmin, whenever a table's statistics are out-of-date, it prompts:

Running VACUUM recommended

The estimated rowcount on the table schema.table deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.

I've tested it using pgAdmin 3 and Postgres 8.4.4, with autovacuum=off. The prompt shows up immediately whenever I click a table that has been changed.

Let's say I'm making a web-based system in Java, how do I detect if a table is out-of-date, so that I can show a prompt like the one in pgAdmin?

Because of the nature of my application, here are a few rules I have to follow:

  1. I want to know if the statistics of a certain table in pg_stats and pg_statistic are up to date.

  2. I can't set the autovacuum flag in postgresql.conf. (In other words, the autovacuum flag can be on or off. I have no control over it. I need to tell if the stats are up-to-date whether the autovacuum flag is on or off.)

  3. I can't run vacuum/analyze every time to make it up-to-date.

  4. When a user selects a table, I need to show the prompt that the table is outdated when there are any updates to this table (such as drop, insert, and update) that are not reflected in pg_stats and pg_statistic.

It seems that it's not feasible by analyzing timestamps in pg_catalog.pg_stat_all_tables. Of course, if a table hasn't been analyzed before, I can check if it has a timestamp in last_analyze to find out whether the table is up-to-date. Using this method, however, I can't detect if the table is up-to-date when there's already a timestamp. In other words, no matter how many rows I add to the table, its last_analyze timestamp in pg_stat_all_tables is always for the first analyze (assuming the autovacuum flag is off). Therefore, I can only show the "Running VACUUM recommended" prompt for the first time.

It's also not feasible by comparing the last_analyze timestamp to the current timestamp. There might not be any updates to the table for days. And there might be tons of updates in one hour.

Given this scenario, how can I always tell if the statistics of a table are up-to-date?

Answer

Sean picture Sean · Aug 1, 2011

Check the system catalogs.

test=# SELECT schemaname, relname, last_analyze FROM pg_stat_all_tables WHERE relname = 'city';
 schemaname | relname |         last_analyze          
------------+---------+-------------------------------
 pagila     | city    | 2011-07-26 19:30:59.357898-07
 world      | city    | 2011-07-26 19:30:53.119366-07
(2 rows)

All kinds of useful information in there:

test=# \d pg_stat_all_tables           View "pg_catalog.pg_stat_all_tables"
      Column       |           Type           | Modifiers 
-------------------+--------------------------+-----------
 relid             | oid                      | 
 schemaname        | name                     | 
 relname           | name                     | 
 seq_scan          | bigint                   | 
 seq_tup_read      | bigint                   | 
 idx_scan          | bigint                   | 
 idx_tup_fetch     | bigint                   | 
 n_tup_ins         | bigint                   | 
 n_tup_upd         | bigint                   | 
 n_tup_del         | bigint                   | 
 n_tup_hot_upd     | bigint                   | 
 n_live_tup        | bigint                   | 
 n_dead_tup        | bigint                   | 
 last_vacuum       | timestamp with time zone | 
 last_autovacuum   | timestamp with time zone | 
 last_analyze      | timestamp with time zone | 
 last_autoanalyze  | timestamp with time zone | 
 vacuum_count      | bigint                   | 
 autovacuum_count  | bigint                   | 
 analyze_count     | bigint                   | 
 autoanalyze_count | bigint                   |