I am looking to get a random record from a huge (100 million record) mongodb
.
What is the fastest and most efficient way to do so? The data is already there and there are no field in which I can generate a random number and obtain a random row.
Any suggestions?
Starting with the 3.2 release of MongoDB, you can get N random docs from a collection using the $sample
aggregation pipeline operator:
// Get one random document from the mycoll collection.
db.mycoll.aggregate([{ $sample: { size: 1 } }])
If you want to select the random document(s) from a filtered subset of the collection, prepend a $match
stage to the pipeline:
// Get one random document matching {a: 10} from the mycoll collection.
db.mycoll.aggregate([
{ $match: { a: 10 } },
{ $sample: { size: 1 } }
])
As noted in the comments, when size
is greater than 1, there may be duplicates in the returned document sample.