Does clustered index on foreign key column increase join performance vs non-clustered?

alpav picture alpav · Mar 11, 2010 · Viewed 7.7k times · Source

In many places it's recommended that clustered indexes are better utilized when used to select range of rows using BETWEEN statement. When I select joining by foreign key field in such a way that this clustered index is used, I guess, that clusterization should help too because range of rows is being selected even though they all have same clustered key value and BETWEEN is not used.

Considering that I care only about that one select with join and nothing else, am I wrong with my guess ?

Answer

mjv picture mjv · Mar 11, 2010

Discussing this type of issue in the absolute isn't very useful.

It is always a case-by-case situation !

Essentially, access by way of a clustered index saves one indirection, period.

Assuming the key used in the JOIN, is that of the clustered index, in a single read [whether from an index seek or from a scan or partial scan, doesn't matter], you get the whole row (record).

One problem with clustered indexes, is that you only get one per table. Therefore you need to use it wisely. Indeed in some cases, it is even wiser not to use any clustered index at all because of INSERT overhead and fragmentation (depending on the key and the order of new keys etc.)

Sometimes one gets the equivalent benefits of a clustered index, with a covering index, i.e. a index with the desired key(s) sequence, followed by the column values we are interested in. Just like a clustered index, a covering index doesn't require the indirection to the underlying table. Indeed the covering index may be slightly more efficient than the clustered index, because it is smaller.
However, and also, just like clustered indexes, and aside from the storage overhead, there is a performance cost associated with any extra index, during INSERT (and DELETE or UPDATE) queries.

And, yes, as indicated in other answers, the "foreign-key-ness" of the key used for the clustered index, has absolutely no bearing on the the performance of the index. FKs are constraints aimed at easing the maintenance of the integrity of the database but the underlying fields (columns) are otherwise just like any other field in the table.

To make wise decisions about index structure, one needs

  • to understands the way the various index types (and the heap) work
    (and, BTW, this varies somewhat between SQL implementations)
  • to have a good image of the statistical profile of the database(s) at hand:
    which are the big tables, which are the relations, what's the average/maximum cardinality of relation, what's the typical growth rate of the database etc.
  • to have good insight regarding the way the database(s) is (are) going to be be used/queried

Then and only then, can one can make educated guesses about the interest [or lack thereof] to have a given clustered index.