I might be a bit in over my head on this as I'm still learning the ins and outs of MongoDB, but here goes.
Right now I'm working on a tool to search/filter through a dataset, sort it by an arbitrary datapoint (eg. popularity) and then group it by an id. The only way I see I can do this is through Mongo's MapReduce functionality.
I can't use .group() because I'm working with more than 10,000 keys and I also need to be able to sort the dataset.
My MapReduce code is working just fine, except for one thing: sorting. Sorting just doesn't want to work at all.
db.runCommand({
'mapreduce': 'products',
'map': function() {
emit({
product_id: this.product_id,
popularity: this.popularity
}, 1);
},
'reduce': function(key, values) {
var sum = 0;
values.forEach(function(v) {
sum += v;
});
return sum;
},
'query': {category_id: 20},
'out': {inline: 1},
'sort': {popularity: -1}
});
I already have a descending index on the popularity datapoint, so it's definitely not working because of a lack of that:
{
"v" : 1,
"key" : { "popularity" : -1 },
"ns" : "app.products",
"name" : "popularity_-1"
}
I just cannot figure out why it doesn't want to sort.
Instead of inlining the result set, I can't output it to another collection and then run a .find().sort({popularity: -1}) on that because of the way this feature is going to work.
First of all, Mongo map/reduce are not designed to be used in as a query tool (as it is in CouchDB), it is design for you to run background tasks. I use it at work to analyze traffic data.
What you are doing wrong however is that you're applying the sort() to your input, but it is useless because when the map()
stage is done the intermediate documents are sorted by each keys
. Because your key is a document, it is being sort by product_id
, popularity
.
This is how I generated my dataset
function generate_dummy_data() {
for (i=2; i < 1000000; i++) {
db.foobar.save({
_id: i,
category_id: parseInt(Math.random() * 30),
popularity: parseInt(Math.random() * 50)
})
}
}
And this my map/reduce task:
var data = db.runCommand({
'mapreduce': 'foobar',
'map': function() {
emit({
sorting: this.popularity * -1,
product_id: this._id,
popularity: this.popularity,
}, 1);
},
'reduce': function(key, values) {
var sum = 0;
values.forEach(function(v) {
sum += v;
});
return sum;
},
'query': {category_id: 20},
'out': {inline: 1},
});
And this is the end result (very long to paste it here):
http://cesarodas.com/results.txt
This works because now we're sorting by sorting, product_id, popularity
. You can play with the sorting how ever you like just remember that the final sorting is by key
regardless of you how your input is sorted.
Anyway as I said before you should avoid doing queries with Map/Reduce it was designed for background processing. If I were you I would design my data in such a way I could access it with simple queries, there is always a trade-off in this case complex insert/updates to have simple queries (that's how I see MongoDB).