Is there any way to recover recently deleted documents in MongoDB?

trex picture trex · Sep 12, 2014 · Viewed 61.4k times · Source

I have removed some documents in my last query by mistake, Is there any way to rollback my last query mongo collection.

Here it is my last query :

 db.foo.remove({ "name" : "some_x_name"}) 

Is there any rollback/undo option? Can I get my data back?

Answer

Adam Comerford picture Adam Comerford · Sep 12, 2014

There is no rollback option (rollback has a different meaning in a MongoDB context), and strictly speaking there is no supported way to get these documents back - the precautions you can/should take are covered in the comments. With that said however, if you are running a replica set, even a single node replica set, then you have an oplog. With an oplog that covers when the documents were inserted, you may be able to recover them.

The easiest way to illustrate this is with an example. I will use a simplified example with just 100 deleted documents that need to be restored. To go beyond this (huge number of documents, or perhaps you wish to only selectively restore etc.) you will either want to change the code to iterate over a cursor or write this using your language of choice outside the MongoDB shell. The basic logic remains the same.

First, let's create our example collection foo in the database dropTest. We will insert 100 documents without a name field and 100 documents with an identical name field so that they can be mistakenly removed later:

use dropTest;
for(i=0; i < 100; i++){db.foo.insert({_id : i})};
for(i=100; i < 200; i++){db.foo.insert({_id : i, name : "some_x_name"})};

Now, let's simulate the accidental removal of our 100 name documents:

> db.foo.remove({ "name" : "some_x_name"})
WriteResult({ "nRemoved" : 100 })

Because we are running in a replica set, we still have a record of these documents in the oplog (being inserted) and thankfully those inserts have not (yet) fallen off the end of the oplog (the oplog is a capped collection remember) . Let's see if we can find them:

use local;
db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}).count();
100

The count looks correct, we seem to have our documents still. I know from experience that the only piece of the oplog entry we will need here is the o field, so let's add a projection to only return that (output snipped for brevity, but you get the idea):

db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1});
{ "o" : { "_id" : 100, "name" : "some_x_name" } }
{ "o" : { "_id" : 101, "name" : "some_x_name" } }
{ "o" : { "_id" : 102, "name" : "some_x_name" } }
{ "o" : { "_id" : 103, "name" : "some_x_name" } }
{ "o" : { "_id" : 104, "name" : "some_x_name" } }

To re-insert those documents, we can just store them in an array, then iterate over the array and insert the relevant pieces. First, let's create our array:

var deletedDocs = db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1}).toArray();
> deletedDocs.length
100

Next we remind ourselves that we only have 100 docs in the collection now, then loop over the 100 inserts, and finally revalidate our counts:

use dropTest;
db.foo.count();
100
// simple for loop to re-insert the relevant elements
for (var i = 0; i < deletedDocs.length; i++) {
    db.foo.insert({_id : deletedDocs[i].o._id, name : deletedDocs[i].o.name});
}
// check total and name counts again
db.foo.count();
200
db.foo.count({name : "some_x_name"})
100

And there you have it, with some caveats:

  • This is not meant to be a true restoration strategy, look at backups (MMS, other), delayed secondaries for that, as mentioned in the comments
  • It's not going to be particularly quick to query the documents out of the oplog (any oplog query is a table scan) on a large busy system.
  • The documents may age out of the oplog at any time (you can, of course, make a copy of the oplog for later use to give you more time)
  • Depending on your workload you might have to de-dupe the results before re-inserting them
  • Larger sets of documents will be too large for an array as demonstrated, so you will need to iterate over a cursor instead
  • The format of the oplog is considered internal and may change at any time (without notice), so use at your own risk