I'm running into the aggregation result exceeds maximum document size (16MB)
error with mongodb aggregation using pymongo.
I was able to overcome it at first using the limit()
option. However, at some point I got the
Exceeded memory limit for $group, but didn't allow external sort. Pass allowDiskUse:true to opt in." error.
Ok, I'll use the {'allowDiskUse':True}
option. This option works when I use it on the commandline, but when I tried to use in my python code
result = work1.aggregate(pipe, 'allowDiskUse:true')
I get TypeError: aggregate() takes exactly 2 arguments (3 given)
error. (that's in spite of the definition given at http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.aggregate: aggregate(pipeline, **kwargs)).
I tried to use runCommand, or rather it's pymongo equivalent:
db.command('aggregate','work1',pipe, {'allowDiskUse':True})
but now I'm back to the 'aggregation result exceeds maximum document size (16MB)' error
In case you need to know
pipe = [{'$project': {'_id': 0, 'summary.trigrams': 1}}, {'$unwind': '$summary'}, {'$unwind': '$summary.trigrams'}, {'$group': {'count': {'$sum': 1}, '_id': '$summary.trigrams'}}, {'$sort': {'count': -1}}, {'$limit': 10000}]
Thank you
So, in order:
aggregate
is a method. It takes 2 positional arguments (self
, which is implicitly passed, and pipeline
) and any number of keyword arguments (which must be passed as foo=bar
-- if there's no =
sign, it's not a keyword argument). This means you need to call result = work1.aggregate(pipe, allowDiskUse=True)
.
Your error about maximum document size is inherent to Mongo. Mongo can never return a document (or array thereof) larger than 16 megabytes. I can't tell you why because you have given us neither your data nor your code, but it probably means that the document you're building as an end result is too large. Try decreasing the $limit
parameter, maybe? Start by setting it to 1, run a test, then increase it and look at how big the result gets when you do that.