How to properly index a many-many association table?

Russ picture Russ · Jan 17, 2011 · Viewed 10.4k times · Source

In a typical many-many arrangement like this...

Movies       Actors       Movies_Actors
------       ------       -------------
movie_ID     actor_ID     FK_movie_ID
title        name         FK_actor_ID

... how should the association table ('Movies_Actors') be indexed for optimal read speed?

I usually see this done only with the composite primary key in the association table, like so:

CREATE TABLE Movies_Actors (
  FK_movie_ID INTEGER,
  FK_actor_ID INTEGER,
  PRIMARY KEY (FK_movie_ID, FK_actor_ID)
)

However, this seems like the index will only be useful when searching for both movie_ID and actor_ID (although I'm not certain on whether a composite index also works for the individual columns).

Since both "what actors are in Movie X" and "what movies has actor Y been in" will be the common queries for this table, it seems like there should be an individual index on each column to quickly locate actors and movies on their own. Does a composite index effectively do this? If not, having a composite index seems pointless on this table. And if a composite index is pointless, what to do about a primary key? The candidate key is clearly the composite of the two columns, but if the resulting composite index is pointless (it mustn't be?) it seems like a waste.

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID), with the choice of which is the primary key (and thus usually clustered) and which is 'just' a unique composite index being based on which direction is queried more.

What is the real story? Does a composite index automatically effectively index each column for searching on one or the other? Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column? What are the behind-the-scene mechancs?


EDIT: I found this related question that for some reason I didn't locate before posting... How to properly index a linking table for many-to-many connection in MySQL?

Answer

Markus Winand picture Markus Winand · Jan 17, 2011

(although I'm not certain on whether a composite index also works for the individual columns).

Yes, it can. But only the prefix: http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys

Also, this link adds some confusion and indicates that it might even be useful to actually specify two composite indices... one of them as (FK_movie_ID, FK_actor_ID), and the other in reverse as (FK_actor_ID, FK_movie_ID),

That's actually the thing to do.

Take one as clustering index, and the other as non-clustering index that will anyways include the clustering index key--hence no need to include the that column again (thx to JNK).

CREATE CLUSTERED INDEX a on Movies_Actors (fk_movie_id, fk_actor_id);
CREATE NONCLUSTERED INDEX b on Movies_Actors (fk_actor_id);

What is the real story?

http://Use-The-Index-Luke.com/ :)

Does a composite index automatically effectively index each column for searching on one or the other?

No. Only the prefix of the index. If you have an index (a,b,c), the query a=? and b=? can use the index. However c=? can't, nor can b=? and c=?.

Should the optimal (in read speed, not size) association table have a composite index in each direction and one on each column?

If you need to join in both directions, yes ("composite index in each direction") and no ("one on each column").

What are the behind-the-scene mechanics?

Well, same link again.

Speaking SQL Server, you might eventually also consider an indexed view. That's kind of pre-joining. Two indexes, as above, might also be fast enough.