Strategies for fast searches of billions of small documents in MongoDB

Neil picture Neil · Jul 19, 2013 · Viewed 11.1k times · Source

I need to store several billion small data structures (around 200 bytes each). So far, storing each element as a separate document is working well, with Mongo providing around 10,000 results per second. I'm using a 20-byte hash as the _id for each document, and a single index on the _id field. In testing, this is working for data sets with 5,000,000 documents.

In operation, we will be making around 10,000 requests per second, updating existing documents about 1,000 times per second, and inserting new documents maybe 100 times per second or less.

How can we manage larger data sets, when we cannot store an entire index in RAM? Will MongoDB perform better if we combine several elements into each document -- for a faster search through the index, but more data being returned in each query?

Unlike other questions on SO, I'm not only interested in how much data we can stuff into Mongo. It can clearly manage the amount of data we're looking at. My concern is how can we maximize the speed of find operations on huge collections, given constrained RAM.

Our searches will tend to be clustered; around 50,000 elements will be satisfy about 50% of the queries, but the remaining 50% will be randomly distributed across all of the data. Can we expect a performance gain by moving those 50% into their own collection, in order to keep a smaller index of the most-used data always in ram?

Would reducing the size of the _id field from 20-bytes to 8-bytes have a significant impact on MnogoDB's indexing speed?

Answer

Rob Moore picture Rob Moore · Jul 22, 2013

A few strategies come to mind:

1) Use a distinct collection/database for the 'hot' documents.

If you know which documents are in the hot set then, yes, moving them into a separate collection will help. This will ensure that the hot documents are co-resident on the same extents/pages. It will also make the index for those documents more likely to be completely in memory. This is due to it being smaller and being (completely?) used more often.

If the hot documents are randomly mixed with other documents then you will likely have to fault in more of the leaf elements of the B-Tree index when loading a document as the probability of another document having recently loaded or accessed the index block is small.

2) Shorten the indexed values.

The shorter the index value the more values that fit into a single B-Tree block. (Note: The keys are not included in the index.) The more entries in a single bucket means fewer buckets and less total memory needed for the index. That translates to the higher probability / longer lifetimes that blocks will stay in memory. In your example a 20->8 character reduction is a better than 50% savings. If you can convert those 8 bytes to a long there is a little more savings since longs do not have a length prefix (4 bytes) and a trailing null (5 bytes total).

3) Shorten the key names.

The shorter the field names the less space each document takes. This has the unfortunate side effect of decreasing readability.

4) Shard

This is really the only way to keep performance up in the face of reads across an entire corpus that exhausts memory and eventual disk bandwidth. If you do shard you will still want to shard the 'hot' collection.

5) Adjust the read-ahead on disk to a small value.

Since the 'non-hot' reads are loading a random document from disk we really only want to read/fault into memory that document and as few of the documents around it as possible. Most systems will try and read-ahead a large block of data once a user reads from a portion of a file. This is exactly the opposite of what we want.

If you see your system faulting a lot but the resident memory for the mongod process does not approach the systems available memory you are likely seeing the effect of the OS reading useless data.

6) Try to use monotonically increasing values for the keys.

This will trigger an optimization (for ObjectId based indexes) that when the index block splits it will do so at 90/10 instead of 50/50. The result is that most of the blocks in your index will be near capacity and you will need fewer of them.

If you only know the 'hot' 50,000 documents after the fact then adding them to the separate collection in index order will also trigger this optimization.

Rob.