Does Firebird need manual reindexing?

pablo picture pablo · May 19, 2009 · Viewed 8.4k times · Source

I use both Firebird embedded and Firebird Server, and from time to time I need to reindex the tables using a procedure like the following:

CREATE PROCEDURE MAINTENANCE_SELECTIVITY 
ASDECLARE VARIABLE S VARCHAR(200);
BEGIN
FOR select RDB$INDEX_NAME FROM RDB$INDICES INTO :S DO
BEGIN
S = 'SET statistics INDEX ' || s || ';';
EXECUTE STATEMENT :s;
END
SUSPEND;
END

I guess this is normal using embedded, but is it really needed using a server? Is there a way to configure the server to do it automatically when required or periodically?

Answer

Lasse V. Karlsen picture Lasse V. Karlsen · May 19, 2009

First, let me point out that I'm no Firebird expert, so I'm answering on the basis of how SQL Server works.

In that case, the answer is both yes, and no.

The indexes are of course updated on SQL Server, in the sense that if you insert a new row, all indexes for that table will contain that row, so it will be found. So basically, you don't need to keep reindexing the tables for that part to work. That's the "no" part.

The problem, however, is not with the index, but with the statistics. You're saying that you need to reindex the tables, but then you show code that manipulates statistics, and that's why I'm answering.

The short answer is that statistics goes slowly out of whack as time goes by. They might not deteriorate to a point where they're unusable, but they will deteriorate down from the perfect level they're in when you recreate/recalculate them. That's the "yes" part.

The main problem with stale statistics is that if the distribution of the keys in the indexes changes drastically, the statistics might not pick that up right away, and thus the query optimizer will pick the wrong indexes, based on the old, stale, statistics data it has on hand.

For instance, let's say one of your indexes has statistics that says that the keys are clumped together in one end of the value space (for instance, int-column with lots of 0's and 1's). Then you insert lots and lots of rows with values that make this index contain values spread out over the entire spectrum.

If you now do a query that uses a join from another table, on a column with low selectivity (also lots of 0's and 1's) against the table with this index of yours, the query optimizer might deduce that this index is good, since it will fetch many rows that will be used at the same time (they're on the same data page).

However, since the data has changed, it'll jump all over the index to find the relevant pieces, and thus not be so good after all.

After recalculating the statistics, the query optimizer might see that this index is sub-optimal for this query, and pick another index instead, which is more suited.

Basically, you need to recalculate the statistics periodically if your data is in flux. If your data rarely changes, you probably don't need to do it very often, but I would still add a maintenance job with some regularity that does this.

As for whether or not it is possible to ask Firebird to do it on its own, then again, I'm on thin ice, but I suspect there is. In SQL Server you can set up maintenance jobs that does this, on a schedule, and at the very least you should be able to kick off a batch file from the Windows scheduler to do something like it.