Efficiently determine owner of a record in a hierarchy with MongoDB

Toby Hede picture Toby Hede · Nov 21, 2011 · Viewed 7k times · Source

I am trying to achieve the following:

Select all the records that I own, where ownership is objects I have created OR objects a user I manage has created, where user management can be in a hierarchy of users managing users

Ownership is clearly straight forward and could be handled by a simple id corresponding to the owner. The hierarchy of user management has me a little stumped to perform without heavy lifting through large lists of IDs (you can obviously just find every user that is managed and list every object created by any of those users using an IN clause or similar).

Ideally this all happens in a single query so normal paging and conditions can occur.

I was thinking that there might have been some maths to get it done - having IDs that can somehow be hashed to determine if they are owned by anyone in the chain of command.

Any references for this sort of thing?

Am I missing something obvious?

Using MongoDB if that makes a difference, but happy to think about other databases for inspiration.

UPDATE: have created a MongoDB collection with 1,000,000 records to get some solid data on exactly what constitutes a manageable number of parameters for an IN clause on a query. Will report back when I have some concrete information.

ANALYSIS:

Using ruby-mongo-driver and the ruby benchmark lib.

MongoDB Collection with 1039944 records

Records are defined as:

{
    first_name: String,
    last_name: String,
    email: String,
    phone: String,
    company: String,
    owner: BSON::ObjectId
 }

With randomly generated values for all fields.

The Owner field has an index.

Running queries with the following conditions:

conditions = {"owner" => { "$in" => id_list }}
opts = {skip: rand, limit: 100}

Results:

    # 10201 ids
    #              user     system      total        real
    # 0:       0.240000   0.000000   0.240000 (  0.265148)
    # 1:       0.240000   0.010000   0.250000 (  0.265757)
    # 2:       0.240000   0.000000   0.240000 (  0.267149)
    # 3:       0.240000   0.000000   0.240000 (  0.269981)
    # 4:       0.240000   0.000000   0.240000 (  0.270436)
    # Find:    0.240000   0.000000   0.240000 (  0.266709)


    # 5201 ids
    #              user     system      total        real
    # 0:       0.120000   0.000000   0.120000 (  0.133824)
    # 1:       0.120000   0.000000   0.120000 (  0.134787)
    # 2:       0.110000   0.000000   0.110000 (  0.133262)
    # 3:       0.110000   0.000000   0.110000 (  0.136046)
    # 4:       0.120000   0.000000   0.120000 (  0.141220)
    # Find:    0.130000   0.000000   0.130000 (  0.139110)

    # 201 ids
    #              user     system      total        real
    # 0:       0.010000   0.000000   0.010000 (  0.006044)
    # 1:       0.000000   0.000000   0.000000 (  0.004681)
    # 2:       0.010000   0.000000   0.010000 (  0.004578)
    # 3:       0.000000   0.000000   0.000000 (  0.007048)
    # 4:       0.010000   0.000000   0.010000 (  0.008487)
    # Find:    0.000000   0.000000   0.000000 (  0.005990)

    # 1 id (NOT using IN)
    #              user     system      total        real
    # 0:       0.000000   0.000000   0.000000 (  0.002868)
    # 1:       0.000000   0.000000   0.000000 (  0.004937)
    # 2:       0.010000   0.000000   0.010000 (  0.003151)
    # 3:       0.000000   0.000000   0.000000 (  0.002983)
    # 4:       0.000000   0.000000   0.000000 (  0.003313)
    # Find:    0.000000   0.000000   0.000000 (  0.002742)

Even with a list of 10k ids in the query, performance is pretty snappy.

Answer

Thilo picture Thilo · Nov 21, 2011

If you are trying to "select" records from MongoDB based on a "column" having a value from a set of possible values that you'd need a join against a user management table to determine, then NoSQL is working against you...

If the list of user IDs is still manageable you can do a where ownerId in (?,?,?,?,?...) type of query (after having first determined the list):

db.documents.find({owner:{$in: [1234, 2345, 4444, 77777, 99999]}})

The NoSQL way is probably to denormalize things, for example by including not just the ownerId in the document, but the complete path up the management hierarchy:

{  _id: 'the document A',
   owner : 1234,
   managers: [ 2345, 4444, 77777, 99999 ]
}

Of course, that will need to be updated when the user hierarchy gets shifted around.