MongoDB $geoNear aggregation pipeline (using query option and using $match pipeline operation) giving different no of results

Akshay Mehta picture Akshay Mehta · Feb 11, 2017 · Viewed 11.2k times · Source

I am using a $geoNear as the first step in the aggregation framework. I need to filter out the results based on "tag" field and it works fine but I see there are 2 ways both giving different results.

Sample MongoDB Document


    {
      "position": [
        40.80143,
        -73.96095
      ],
      "tag": "pizza"
    }

I have added 2dsphere index to the "position" key


    db.restaurants.createIndex( { 'position' : "2dsphere" } )

Query 1

uses $match aggregration pipeline operation to filter out the results based on "tag" key

    db.restaurants.aggregate(
      [
       {
           "$geoNear":{

               "near": { type: "Point", coordinates: [ 55.8284,-4.207] },
               "limit":100,
               "maxDistance":10*1000,
               "distanceField": "dist.calculated",
               "includeLocs": "dist.location",
               "distanceMultiplier":1/1000,
               "spherical": true
        }
       },{
           "$match":{"tag":"pizza"}
       },

       {
          "$group":{"_id":null,"totalDocs":{"$sum":1}}
       }
      ]
    );

Query 2

Uses query inside the $geoNear aggregation operation to filter results based on "tag" key

    db.restaurants.aggregate(
      [
       {
           "$geoNear":{
               "query" : {"tag":"pizza"}
               "near": { type: "Point", coordinates: [ 55.8284,-4.207] },
               "limit":100,
               "maxDistance":10*1000,
               "distanceField": "dist.calculated",
               "includeLocs": "dist.location",
               "distanceMultiplier":1/1000,
               "spherical": true
        }
       },
       {
          "$group":{"_id":null,"totalDocs":{"$sum":1}}
       }
      ]
    );

The grouping option is just to get the count of documents returned by both the queries.

The totalDocs returned by both queries seem to be different.

Can someone explain me the differences between both the queries ?

Answer

Akshay Mehta picture Akshay Mehta · Feb 13, 2017

Few assumptions:-
1. Assume there are 300 records that match based on the location.
2. Assume first set of 100 results do not have tag pizza. The rest 200 documents (101 to 300) have tag pizza

Query 1:-

  • There are 2 pipeline operations $geoNear and $match
  • The output of $geoNear pipeline operation is the input to $match pipeline operation
  • $geoNear finds max of 100 results (limit we have specified) based on the location sorted by nearest to far distance. (Note here that the 100 results retured are purely based on the location. So these 100 results do not contain any document with tag "pizza")
  • These 100 results are sent to the next pipeline operation $match from where the filtering happens. But since the first set of 100 results did not have tag pizza, the output is empty

Query 2:-

  • There is only 1 pipeline operation $geoNear
  • There is a query field included in the $geoNear pipeline operation $geoNear finds max of 100 results (limit we have specified) based on the location sorted by nearest to far distance and the query tag=pizza
  • Now here the results from 101 to 200 are returned as output as the query is included within the pipeline operation $geoNear. So in simple sentence we say, find all documents with location [x,y] with tag=pizza.

P.S : - The $group pipeline stage is added just for getting the count and hence have not written about it in the explaination