Mongo indexing on object arrays vs objects

Ghjnut picture Ghjnut · Mar 6, 2012 · Viewed 17.6k times · Source

I'm implementing a contact database that handles quite a few fields. Most of them are predefined and can be considered bound, but there are a couple that aren't. We'll call one of these fields 'groups'. The way we currently implement it is (each document/contact has 'groups' field):

'groups' : {
   152 : 'hi',
   111 : 'group2'
}

but after some reading I've it would seem I should be doing it:

'groups' : [
   { 'id' : 152, 'name' : 'hi' },
   { 'id' : 111, 'name' : 'group2' }
   ...
]

and then apply the index db.contact.ensureIndex({'groups.id':1});

My question is in regard to functionality. What are the differences between the 2 structures and how is the index actually built (is it simply indexing within each document/contact or is it building a full-scale index that has all the groups from all the documents/contacts?).

I'm kind of going in under the assumption that this is structurally the best way, but if I'm incorrect, let me know.

Answer

Marc picture Marc · Mar 7, 2012

Querying will certainly be a lot easier in the second case, where 'groups' is an array of sub-documents, each with an 'id' and a 'name'.

Mongo does not support "wildcard" queries, so if your documents were structured the first way and you wanted to find a sub-document with the value "hi", but did not know that the key was 152, you would not be able to do it. With the second document structure, you can easily query for {"groups.name":"hi"}.

For more information on querying embedded objects, please see the documentation titled "Dot Notation (Reaching into Objects)" http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29 The "Value in an Array" and "Value in an Embedded Object" sections of the "Advanced Queries" documentation are also useful: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanArray

For an index on {'groups.id':1}, an index entry will be created for every "id" key in every "groups" array in every document. With an index on "groups", only one index entry will be created per document.

If you have documents of the second type, and an index on groups, your queries will have to match entire sub-documents in order to make use of the index. For example, given the document:

{ "_id" : 1, "groups" : [ { "id" : 152, "name" : "hi" }, { "id" : 111, "name" : "group2" } ] }

The query

db.<collectionName>.find({groups:{ "id" : 152, "name" : "hi" }}) 

will make use of the index, but the queries

db.<collectionName>.find({"groups":{$elemMatch:{name:"hi"}}})

or

db.<collectionName>.find({"groups.name":"hi"})

will not.

The index(es) that you create should depend on which queries you will most commonly be performing.

You can experiment with which (if any) indexes your queries are using with the .explain() command. http://www.mongodb.org/display/DOCS/Explain The first line, "cursor" will tell you which index is being used. "cursor" : "BasicCursor" indicates that a full collection scan is being performed.

There is more information on indexing in the documentation: http://www.mongodb.org/display/DOCS/Indexes

The "Indexing Array Elements" section of the above links to the document titled "Multikeys": http://www.mongodb.org/display/DOCS/Multikeys

Hopefully this will improve your understanding of how to query on embedded documents, and how indexes are used. Please let us know if you have any follow-up questions!