mongodb aggregation framework group by two fields

castt picture castt · Mar 29, 2013 · Viewed 8.3k times · Source

I'm querying my database using aggregation and pipeline, with two separate queries:

 $groups_q = array(
            '$group' => array(
                '_id' => '$group_name',
                'total_sum' => array('$sum' => 1)
                )
            );

  $statuses_q = array(
            '$group' => array(
                '_id' => '$user_status',
                'total_sum' => array('$sum' => 1)
                )
            );

$data['statuses'] = $this->mongo_db->aggregate('users',$statuses_q);
$data['groups'] = $this->mongo_db->aggregate('users',$groups_q);

And I'm getting what I want:

Array
(
[statuses] => Array
    (
        [result] => Array
            (
                [0] => Array
                    (
                        [_id] => Inactive
                        [total_sum] => 2
                    )

                [1] => Array
                    (
                        [_id] => Active
                        [total_sum] => 5
                    )

            )

        [ok] => 1
    )

[groups] => Array
    (
        [result] => Array
            (
                [0] => Array
                    (
                        [_id] => Accounting 
                        [total_sum] => 1
                    )

                [1] => Array
                    (
                        [_id] => Administrator
                        [total_sum] => 2
                    )

                [2] => Array
                    (
                        [_id] => Rep
                        [total_sum] => 1
                    )
            )

        [ok] => 1
    )

)

I don't want to query my database twice. Is there is a better way to do it? How can I accomplish it with one query? Should I use $project operator?

Answer

Stennie picture Stennie · Mar 29, 2013

You can't use a single aggregate() to do two grouped counts with your desired result format. Once the data has been grouped the first time you no longer have the details needed to create the second count.

The straightforward approach is to do two queries, as you are already doing ;-).

Thoughts on alternatives

If you really wanted to get the information in one aggregation query you could group on both fields and then do some manipulation in your application code. With two fields in the group _id, results are going to be every combination of group_name and status.

Example using the mongo shell :

db.users.aggregate(
    { $group: {
         _id: { group_name: "$group_name", status: "$status" },
         'total_sum': { $sum: 1 }
    }}
)

That doesn't seem particularly efficient and lends itself to some convoluted application code because you have to iterate the results twice to get the expected groupings.

If you only wanted the unique names for each group instead of the names + counts, you could use $addToSet in a single group.

The other obvious alternative would be to do the grouping in your application code. Do a single find() projecting only the group_name and status fields, and build up your count arrays as you iterate the results.