MongoDB and composite primary keys

herbrandson picture herbrandson · Apr 19, 2014 · Viewed 46.3k times · Source

I'm trying to determine the best way to deal with a composite primary key in a mongo db. The main key for interacting with the data in this system is made up of 2 uuids. The combination of uuids is guaranteed to be unique, but neither of the individual uuids is.

I see a couple of ways of managing this:

  1. Use an object for the primary key that is made up of 2 values (as suggested here)

  2. Use a standard auto-generated mongo object id as the primary key, store my key in two separate fields, and then create a composite index on those two fields

  3. Make the primary key a hash of the 2 uuids

  4. Some other awesome solution that I currently am unaware of

What are the performance implications of these approaches?

For option 1, I'm worried about the insert performance due to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

For option 2, it seems a little odd to have a primary key that would never be used by the system. Also, it seems that query performance might not be as good as in option 1. In a traditional RDBMS a clustered index gives the best query results. How relevant is this in MongoDB?

For option 3, this would create one single id field, but again it wouldn't be sequential when inserting. Are there any other pros/cons to this approach?

For option 4, well... what is option 4?

Also, there's some discussion of possibly using CouchDB instead of MongoDB at some point in the future. Would using CouchDB suggest a different solution?

MORE INFO: some background about the problem can be found here

Answer

Asya Kamsky picture Asya Kamsky · Apr 27, 2014

You should go with option 1.

The main reason is that you say you are worried about performance - using the _id index which is always there and already unique will allow you to save having to maintain a second unique index.

For option 1, I'm worried about the insert performance do to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

Your other options do not avoid this problem, they just shift it from the _id index to the secondary unique index - but now you have two indexes, once that's right-balanced and the other one that's random access.

There is only one reason to question option 1 and that is if you plan to access the documents by just one or just the other UUID value. As long as you are always providing both values and (this part is very important) you always order them the same way in all your queries, then the _id index will be efficiently serving its full purpose.

As an elaboration on why you have to make sure you always order the two UUID values the same way, when comparing subdocuments { a:1, b:2 } is not equal to { b:2, a:1 } - you could have a collection where two documents had those values for _id. So if you store _id with field a first, then you must always keep that order in all of your documents and queries.

The other caution is that index on _id:1 will be usable for query:

db.collection.find({_id:{a:1,b:2}}) 

but it will not be usable for query

db.collection.find({"_id.a":1, "_id.b":2})