In a junction table, should I use a Primary key and a unique constraint, or a compound/composite key?

user1447679 picture user1447679 · Mar 4, 2015 · Viewed 8.8k times · Source

I have read through handfuls of what would seem to make this a duplicate question. But reading through all of these has left me uncertain. I'm hoping to get an answer based on the absolute example below, as many questions/answers trail off into debates back and forth.

If I have:

dbo.Book
--------
BookID PK int identity(1,1)

dbo.Author
----------
AuthorID PK int identity(1,1)

Now I have two choices for a simple junction table:

dbo.BookAuthor
--------------
BookID CPK and FK
AuthorID CPK and FK

The above would be a compound/composite key on both FKs, as well as set up the FK relationships for both columns - also using Cascade on delete.

OR

dbo.BookAuthor
--------------
RecordID PK int identity(1,1)
BookID FK
AuthorID FK

Foreign key relationships on BookID and AuthorID, along with Cascade on delete. Also set up a unique constraint on BookID and AuthorID.

I'm looking for a simple answer as to why one method is better than another in the ABOVE particular example. The answers that I'm reading are very detailed, and I was just about to settle on a compound key, but then watched a video where the example used an Identity column like my first example.

It seems this topic is slightly torn in half, but my gut is telling me that I should just use a composite key.

What's more efficient for querying? It seems having a PK identity column along with setting up a unique constraint on the two columns, AND the FK relationships would be more costly, even if a little.

Answer

Joel Coehoorn picture Joel Coehoorn · Mar 4, 2015

This is something I've always remembered from my database course way back in college. We were covering the section from the textbook on "Entity Design" and it was talking about junction tables... we called them intersect tables or intersection relations. I was actually paying attention in class that day. The professor said, in his experience, a many-to-many junction table almost always indicates an unidentified missing entity. These entities almost always end up with data of their own.

We were given an example of Student and Course entities. For a student to take a course, you need to junction between those two. What you actually have as a result is a new entity: an Enrollment. The additional data in this case would be things like Credit Type (audit vs regular) or Final Grade.

I remember that advice to this day... but I don't always follow it. What I will do in this situation is stop, and make sure to go back to the stakeholders on the issue and work with them on what data points we might still be missing in this junction. If we really can't find anything, then I'll use the compound key. When we do find data, we think of a better name and it gets a surrogate key.

Update in 2020
I still have the textbook, and by amazing coincidence both it and this question were brought to my attention within a few hours of each other. So for the curious, it was Chapter 5, section 6, of the 7th edition of this book:

https://www.amazon.com/Database-Processing-Fundamentals-Design-Implementation-dp-9332549958/dp/9332549958/