For what would be this query in SQL (to find duplicates):
SELECT userId, name FROM col GROUP BY userId, name HAVING COUNT(*)>1
I performed this simple query in MongoDB:
res = db.col.group({key:{userId:true,name:true},
reduce: function(obj,prev) {prev.count++;},
initial: {count:0}})
I've added a simple Javascript loop to go over the result set, and performed a filter to find all the fields with a count > 1 there, like so:
for (i in res) {if (res[i].count>1) printjson(res[i])};
Is there a better way to do this other than using javascript code in the client? If this is the best/simplest way, say that it is, and this question will help someone :)
After this question was asked and answered, 10gen released Mongodb version 2.2 with an aggregation framework. The new best way to do this query is:
db.col.aggregate( [
{ $group: { _id: { userId: "$userId", name: "$name" },
count: { $sum: 1 } } },
{ $match: { count: { $gt: 1 } } },
{ $project: { _id: 0,
userId: "$_id.userId",
name: "$_id.name",
count: 1}}
] )
10gen has a handy SQL to Mongo Aggregation conversion chart worth bookmarking.