How to query for distinct results in mongodb with python?

Rolando picture Rolando · Aug 17, 2012 · Viewed 12.7k times · Source

I have a mongo collection with multiple documents, suppose the following (assume Tom had two teachers for History in 2012 for whatever reason)

{
"name" : "Tom"
"year" : 2012
"class" : "History"
"Teacher" : "Forester"
}

{
"name" : "Tom"
"year" : 2011
"class" : "Math"
"Teacher" : "Sumpra"
}


{
"name" : "Tom",
"year" : 2012,
"class" : "History",
"Teacher" : "Reiser"
}

I want to be able to query for all the distinct classes "Tom" has ever had, even though Tom has had multiple "History" classes with multiple teachers, I just want the query to get the minimal number of documents such that Tom is in all of them, and "History" shows up one time, as opposed to having a query result that contains multiple documents with "History" repeated.

I took a look at: http://mongoengine-odm.readthedocs.org/en/latest/guide/querying.html

and want to be able to try something like:

student_users = Students.objects(name = "Tom", class = "some way to say distinct?")

Though it does not appear to be documented. If this is not the syntactically correct way to do it, is this possible in mongoengine, or is there some way to accomplish with some other library like pymongo? Or do i have to query for all documents with Tom then do some post-processing to get to unique values? Syntax would be appreciated for any case.

Answer

Rostyslav Dzinko picture Rostyslav Dzinko · Aug 17, 2012

First of all, it's only possible to get distinct values on some field (only one field) as explained in MongoDB documentation on Distinct.

Mongoengine's QuerySet class does support distinct() method to do the job.

So you might try something like this to get results:

Students.objects(name="Tom").distinct(field="class")

This query results in one BSON-document containing list of classes Tom attends.

Attention Note that returned value is a single document, so if it exceeds max document size (16 MB), you'll get error and in that case you have to switch to map/reduce approach to solve such kind of problems.