MongoDB regular expression with indexed field

Eramir picture Eramir · Nov 12, 2011 · Viewed 10.3k times · Source

I was creating my first app using MongoDB. Created index for a field, and tried a find query with $regex param, launched in a shell

> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
        "cursor" : "BtreeCursor A_1 multi",
        "nscanned" : 500001,
        "nscannedObjects" : 10,
        "n" : 10,
        "millis" : 956,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {
                "A" : [
                        [
                                "",
                                {

                                }
                        ],
                        [
                                /BLABLA!25500[0-9]/,
                                /BLABLA!25500[0-9]/
                        ]
                ]
        }
}

It's very strange, because when i'm launching the same query, but with no index in collection, the performance is much better.

> db.foo.find({A:{$regex:'BLABLA!25500[0-9]'}}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 500002,
        "nscannedObjects" : 500002,
        "n" : 10,
        "millis" : 531,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
        "indexBounds" : {

        }
}

Obviously, searching a field with index without regex is working much faster(i.e. searching document with constant field) , but i'm really interested in reason of such behavior.

Answer

Adam Comerford picture Adam Comerford · Jul 6, 2012

The reason for the performance differential here is likely that, with the index enabled, your query must traverse the index (load into memory), then load the matching documents to be returned into memory also. Since you are not using the prefix query all values in the index will be scanned and tested against the regular expression. Not very efficient.

When you remove the index you are just doing a table scan and matching the regex there - essentially you simplified things from the first one slightly.

You might be able to make the indexed version quicker if it were a covered index query, it would also likely be faster if this were a compound index and you needed to combine it with the criteria for another field.

When you use a prefix query, it's not that it only uses an index then, but you use the index efficiently, which is key, and hence you see the real performance gains.