Best way to do one-to-many "JOIN" in CouchDB

mit picture mit · Jun 13, 2010 · Viewed 13.1k times · Source

I am looking for a CouchDB equivalent to "SQL joins".

In my example there are CouchDB documents that are list elements:

{ "type" : "el", "id" : "1", "content" : "first" } 
{ "type" : "el", "id" : "2", "content" : "second" } 
{ "type" : "el", "id" : "3", "content" : "third" } 

There is one document that defines the list:

{ "type" : "list", "elements" : ["2","1"] , "id" : "abc123" }

As you can see the third element was deleted, it is no longer part of the list. So it must not be part of the result. Now I want a view that returns the content elements including the right order.

The result could be:

{ "content" : ["second", "first"] }

In this case the order of the elements is already as it should be. Another possible result:

{ "content" : [{"content" : "first", "order" : 2},{"content" : "second", "order" : 1}] }

I started writing the map function:

map = function (doc) {
  if (doc.type === 'el') {
    emit(doc.id, {"content" : doc.content}); //emit the id and the content
    exit;
  }
  if (doc.type === 'list') {
    for ( var i=0, l=doc.elements.length; i<l; ++i ){
      emit(doc.elements[i], { "order" : i }); //emit the id and the order
    }
  }
}

This is as far as I can get. Can you correct my mistakes and write a reduce function? Remember that the third document must not be part of the result.

Of course you can write a different map function also. But the structure of the documents (one definig element document and an entry document for each entry) cannot be changed.


EDIT: Do not miss JasonSmith's comment to his answer, where he describes how to do this shorter.

Answer

JasonSmith picture JasonSmith · Jun 14, 2010

Thank you! This is a great example to show off CouchDB 0.11's new features!

You must use the fetch-related-data feature to reference documents in the view. Optionally, for more convenient JSON, use a _list function to clean up the results. See Couchio's writeup on "JOIN"s for details.

Here is the plan:

  1. Firstly, you have a uniqueness contstraint on your el documents. If two of them have id=2, that's a problem. It is necessary to use the _id field instead if id. CouchDB will guarantee uniqueness, but also, the rest of this plan requires _id in order to fetch documents by ID.

    { "type" : "el", "_id" : "1", "content" : "first" } 
    { "type" : "el", "_id" : "2", "content" : "second" } 
    { "type" : "el", "_id" : "3", "content" : "third" } 
    

    If changing the documents to use _id is absolutely impossible, you can create a simple view to emit(doc.id, doc) and then re-insert that into a temporary database. This converts id to _id but adds some complexity.

  2. The view emits {"_id": content_id} data keyed on [list_id, sort_number], to "clump" the lists with their content.

    function(doc) {
      if(doc.type == 'list') {
        for (var i in doc.elements) {
          // Link to the el document's id.
          var id = doc.elements[i];
          emit([doc.id, i], {'_id': id});
        }
      }
    }
    

    Now there is a simple list of el documents, in the correct order. You can use startkey and endkey if you want to see only a particular list.

    curl localhost:5984/x/_design/myapp/_view/els
    {"total_rows":2,"offset":0,"rows":[
    {"id":"036f3614aeee05344cdfb66fa1002db6","key":["abc123","0"],"value":{"_id":"2"}},
    {"id":"036f3614aeee05344cdfb66fa1002db6","key":["abc123","1"],"value":{"_id":"1"}}
    ]}
    
  3. To get the el content, query with include_docs=true. Through the magic of _id, the el documents will load.

    curl localhost:5984/x/_design/myapp/_view/els?include_docs=true
    {"total_rows":2,"offset":0,"rows":[
    {"id":"036f3614aeee05344cdfb66fa1002db6","key":["abc123","0"],"value":{"_id":"2"},"doc":{"_id":"2","_rev":"1-4530dc6946d78f1e97f56568de5a85d9","type":"el","content":"second"}},
    {"id":"036f3614aeee05344cdfb66fa1002db6","key":["abc123","1"],"value":{"_id":"1"},"doc":{"_id":"1","_rev":"1-852badd683f22ad4705ed9fcdea5b814","type":"el","content":"first"}}
    ]}
    

    Notice, this is already all the information you need. If your client is flexible, you can parse the information out of this JSON. The next optional step simply reformats it to match what you need.

  4. Use a _list function, which simply reformats the view output. People use them to output XML or HTML however we will make the JSON more convenient.

    function(head, req) {
      var headers = {'Content-Type': 'application/json'};
      var result;
      if(req.query.include_docs != 'true') {
        start({'code': 400, headers: headers});
        result = {'error': 'I require include_docs=true'};
      } else {
        start({'headers': headers});
        result = {'content': []};
        while(row = getRow()) {
          result.content.push(row.doc.content);
        }
      }
      send(JSON.stringify(result));
    }
    

    The results match. Of course in production you will need startkey and endkey to specify the list you want.

    curl -g 'localhost:5984/x/_design/myapp/_list/pretty/els?include_docs=true&startkey=["abc123",""]&endkey=["abc123",{}]'
    {"content":["second","first"]}