limit() and sort() order pymongo and mongodb

disruptive picture disruptive · Mar 27, 2016 · Viewed 19.3k times · Source

Despite reading peoples answers stating that the sort is done first, evidence shows something different that the limit is done before the sort. Is there a way to force sort always first?

views = mongo.db.view_logging.find().sort([('count', 1)]).limit(10)

Whether I use .sort().limit() or .limit().sort(), the limit takes precedence. I wonder if this is something to do with pymongo...

Answer

alecxe picture alecxe · Mar 30, 2016

According to the documentation, regardless of which goes first in your chain of commands, sort() would be always applied before the limit().

You can also study the .explain() results of your query and look at the execution stages - you will find that the sorting input stage examines all of the filtered (in your case all documents in the collection) and then the limit is applied.


Let's go through an example.

Imagine there is a foo database with a test collection having 6 documents:

>>> col = db.foo.test
>>> for doc in col.find():
...     print(doc)
{'time': '2016-03-28 12:12:00', '_id': ObjectId('56f9716ce4b05e6b92be87f2'), 'value': 90}
{'time': '2016-03-28 12:13:00', '_id': ObjectId('56f971a3e4b05e6b92be87fc'), 'value': 82}
{'time': '2016-03-28 12:14:00', '_id': ObjectId('56f971afe4b05e6b92be87fd'), 'value': 75}
{'time': '2016-03-28 12:15:00', '_id': ObjectId('56f971b7e4b05e6b92be87ff'), 'value': 72}
{'time': '2016-03-28 12:16:00', '_id': ObjectId('56f971c0e4b05e6b92be8803'), 'value': 81}
{'time': '2016-03-28 12:17:00', '_id': ObjectId('56f971c8e4b05e6b92be8806'), 'value': 90}

Now, let's execute queries with different order of sort() and limit() and check the results and the explain plan.

Sort and then limit:

>>> from pprint import pprint
>>> cursor = col.find().sort([('time', 1)]).limit(3)  
>>> sort_limit_plan = cursor.explain()
>>> pprint(sort_limit_plan)
{u'executionStats': {u'allPlansExecution': [],
                     u'executionStages': {u'advanced': 3,
                                          u'executionTimeMillisEstimate': 0,
                                          u'inputStage': {u'advanced': 6,
                                                          u'direction': u'forward',
                                                          u'docsExamined': 6,
                                                          u'executionTimeMillisEstimate': 0,
                                                          u'filter': {u'$and': []},
                                                          u'invalidates': 0,
                                                          u'isEOF': 1,
                                                          u'nReturned': 6,
                                                          u'needFetch': 0,
                                                          u'needTime': 1,
                                                          u'restoreState': 0,
                                                          u'saveState': 0,
                                                          u'stage': u'COLLSCAN',
                                                          u'works': 8},
                                          u'invalidates': 0,
                                          u'isEOF': 1,
                                          u'limitAmount': 3,
                                          u'memLimit': 33554432,
                                          u'memUsage': 213,
                                          u'nReturned': 3,
                                          u'needFetch': 0,
                                          u'needTime': 8,
                                          u'restoreState': 0,
                                          u'saveState': 0,
                                          u'sortPattern': {u'time': 1},
                                          u'stage': u'SORT',
                                          u'works': 13},
                     u'executionSuccess': True,
                     u'executionTimeMillis': 0,
                     u'nReturned': 3,
                     u'totalDocsExamined': 6,
                     u'totalKeysExamined': 0},
 u'queryPlanner': {u'indexFilterSet': False,
                   u'namespace': u'foo.test',
                   u'parsedQuery': {u'$and': []},
                   u'plannerVersion': 1,
                   u'rejectedPlans': [],
                   u'winningPlan': {u'inputStage': {u'direction': u'forward',
                                                    u'filter': {u'$and': []},
                                                    u'stage': u'COLLSCAN'},
                                    u'limitAmount': 3,
                                    u'sortPattern': {u'time': 1},
                                    u'stage': u'SORT'}},
 u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',
                 u'host': u'h008742.mongolab.com',
                 u'port': 53439,
                 u'version': u'3.0.7'}}

Limit and then sort:

>>> cursor = col.find().limit(3).sort([('time', 1)])
>>> limit_sort_plan = cursor.explain()
>>> pprint(limit_sort_plan)
{u'executionStats': {u'allPlansExecution': [],
                     u'executionStages': {u'advanced': 3,
                                          u'executionTimeMillisEstimate': 0,
                                          u'inputStage': {u'advanced': 6,
                                                          u'direction': u'forward',
                                                          u'docsExamined': 6,
                                                          u'executionTimeMillisEstimate': 0,
                                                          u'filter': {u'$and': []},
                                                          u'invalidates': 0,
                                                          u'isEOF': 1,
                                                          u'nReturned': 6,
                                                          u'needFetch': 0,
                                                          u'needTime': 1,
                                                          u'restoreState': 0,
                                                          u'saveState': 0,
                                                          u'stage': u'COLLSCAN',
                                                          u'works': 8},
                                          u'invalidates': 0,
                                          u'isEOF': 1,
                                          u'limitAmount': 3,
                                          u'memLimit': 33554432,
                                          u'memUsage': 213,
                                          u'nReturned': 3,
                                          u'needFetch': 0,
                                          u'needTime': 8,
                                          u'restoreState': 0,
                                          u'saveState': 0,
                                          u'sortPattern': {u'time': 1},
                                          u'stage': u'SORT',
                                          u'works': 13},
                     u'executionSuccess': True,
                     u'executionTimeMillis': 0,
                     u'nReturned': 3,
                     u'totalDocsExamined': 6,
                     u'totalKeysExamined': 0},
 u'queryPlanner': {u'indexFilterSet': False,
                   u'namespace': u'foo.test',
                   u'parsedQuery': {u'$and': []},
                   u'plannerVersion': 1,
                   u'rejectedPlans': [],
                   u'winningPlan': {u'inputStage': {u'direction': u'forward',
                                                    u'filter': {u'$and': []},
                                                    u'stage': u'COLLSCAN'},
                                    u'limitAmount': 3,
                                    u'sortPattern': {u'time': 1},
                                    u'stage': u'SORT'}},
 u'serverInfo': {u'gitVersion': u'6ce7cbe8c6b899552dadd907604559806aa2e9bd',
                 u'host': u'h008742.mongolab.com',
                 u'port': 53439,
                 u'version': u'3.0.7'}}

As you can see, in both cases the sort is applied first and affects all the 6 documents and then the limit limits the results to 3.

And, the execution plans are exactly the same:

>>> from copy import deepcopy  # just in case
>>> cursor = col.find().sort([('time', 1)]).limit(3)
>>> sort_limit_plan = deepcopy(cursor.explain())
>>> cursor = col.find().limit(3).sort([('time', 1)])
>>> limit_sort_plan = deepcopy(cursor.explain())
>>> sort_limit_plan == limit_sort_plan
True

Also see: