Delete all documents in a CouchDB database *except* the design documents

blueFast picture blueFast · Apr 14, 2012 · Viewed 11k times · Source

Is it possible to delete all documents in a couchdb database, except design documents, without creating a specific view for that?

My first approach has been to access the _all_docs standard view, and discard those documents starting with _design. This works but, for large databases, is too slow, since the documents need to be requested from the database (in order to get the document revision) one at a time.

If this is the only valid approach, I think it is much more practical to delete the complete database, and create it from scratch inserting the design documents again.

Answer

JasonSmith picture JasonSmith · Apr 16, 2012

I can think of a couple of ideas.

Use _all_docs

You do not need to fetch all the documents, only the ID and revisions. By default, that is all that _all_docs returns. You can make a pretty big request in a batch (10k or 100k docs at a time should be fine).

Replicate then delete

You could use an _all_docs query to get the IDs of all design documents.

GET /db/_all_docs?startkey="_design/"&endkey="_design0"

Then replicate them somewhere temporary.

POST /_replicator

{ "source":"db", "target":"db_ddocs", "create_target":true
, "user_ctx": {"roles":["_admin"]}
, "doc_ids": ["_design/ddoc_1", "_design/ddoc_2", "etc..."]
}

Now you can just delete the original database and replicate the temporary one back by swapping the "source" and "target" values.

Deleting vs "deleting"

Note, these are really apples vs. oranges techniques. By deleting a database, you are wiping out the edit history of all its documents. In other words, you cannot replicate those deletion events to any other database. When you "delete" a document in CouchDB, it stores a record of that deletion. If you replicate that database, those deletions will be reflected in the target. (CouchDB stores "tombstones" indicating the document ID, its revision history, and its deleted state.)

That may or may not be important to you. The first idea is probably considered more "correct" however I can see the value of the second. You can visualize the entire program to accomplish this in your head. It's only a few queries and you're done. No looping through _all_docs batches, no headache. Your specific situation will probably make it obvious which is better.