In MongoDB, when to use a simple subdocument, when an array with 2-field elements?

hyde picture hyde · Oct 25, 2013 · Viewed 7.3k times · Source

Background

I am storing table rows as MongoDb documents, with each column having a name. Let's say table has these columns of interest: Identifier, Person, Date, Count. The MongoDb document also has some extra fields separate from the table data, represented by timestamp. Columns are not fixed (which is why I use schema-free database to store them in the first place).

There will be need to do various complex, but so far unspecified queries. I am not very concerned about performance, though query performance may conceivably become a bottleneck. Once inserted, documents will not be modifed (a new document with same Identifier will be created instead), and insertions are not very frequent (let's say, 1000 new MongoDb documents per day). So amount of data will steadily grow over time.

Example

The straight-forward approach is having a collection of MongoDb documents like:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: {
       Identifier: "AB002",
       Person: "John002",
       Date: ISODate("2013-11-16T21:26:17Z"),
       Count: 1
   }
}

Now I have seen an alternative approach (for example in accepted answer of this question), using array with two fields per object:

{
   _id: XXXX,
   insertDate: ISODate("2012-10-15T21:26:17Z"),
   flag: true,
   data: [
       { field: "Identifier", value: "AB002" },
       { field: "Person", value: "John001" },
       { field: "Date", value: ISODate("2013-11-16T21:26:17Z") },
       { field: "Count", value: 1 }
   ]
}

Questions

Does the 2nd approach make any sense at all?

If yes, then how to choose which to use? Especially, are there some specific kinds of queries which are easy/cheap with one approach, hard/costly with another? Any "rules of thumb" on which way to go, or pro-con lists for both? Example real-life cases of one aproach being inconvenient would be especially valuable.

Answer

EmptyArsenal picture EmptyArsenal · Oct 29, 2013

I don't think that the situation in the other example here and yours are the same. In the other example, they're creating a list of items with one of two answers, which would be more appropriately in an array, and the goal is to return a list of subdocuments that match the criteria. In your example, you're really just describing an object since they all hold different types of information, and you won't need to retrieve searchable bits of the subdocuments.