Been working with MongoDB for a while and today I had a doubt while discussing with a colleague.
The thing is that when you create an index in MongoDB, the collection is processed and the index is built.
The index is updated within insertion and deletion of documents so I don't really see the need to run a rebuild index operation (which drops the index and then rebuild it).
According to MongoDB documentation:
Normally, MongoDB compacts indexes during routine updates. For most users, the reIndex command is unnecessary. However, it may be worth running if the collection size has changed significantly or if the indexes are consuming a disproportionate amount of disk space.
Does someone has had the need of running a rebuild index operation that worth it?
As per the MongoDB documentation, there is generally no need to routinely rebuild indexes.
NOTE: Any advice on storage becomes more interesting with MongoDB 3.0+, which introduced a pluggable storage engine API. My comments below are specifically in reference to the default MMAP storage engine in MongoDB 3.0 and earlier. WiredTiger and other storage engines have different storage implementations for data & indexes.
There may be some benefit in rebuilding an index with the MMAP storage engine if:
An index is consuming a larger than expected amount of space compared to the data. Note: you need to monitor historical data & index size to have a baseline for comparison.
You want to migrate from an older index format to a newer one. If a reindex is advisible this will be mentioned in the upgrade notes. For example, MongoDB 2.0 introduced significant index performance improvements so the release notes include a suggested reindex to the v2.0 format after upgrading. Similarly, MongoDB 2.6 introduced 2dsphere
(v2.0) indexes which have a different default behaviour (sparse by default). Existing indexes are not rebuilt after index version upgrades; the choice of if/when to upgrade is left to the database administrator.
You have changed the _id
format for a collection to or from a monotonically increasing key (eg. ObjectID) to a random value. This is a bit esoteric, but there's an index optimisation that splits b-tree buckets 90/10 (instead of 50/50) if you are inserting _id
s that are always increasing (ref: SERVER-983). If the nature of your _id
s changes significantly, it may be possible to build a more efficient b-tree with a re-index.
For more information on general B-tree behaviour, see: Wikipedia: B-tree
If you're really curious to dig into the index internals a bit more, there are some experimental commands/tools you can try. I expect these are limited to MongoDB 2.4 & 2.6 only: