Elasticsearch relationship mappings (one to one and one to many)

Aruljothi picture Aruljothi · May 1, 2014 · Viewed 17.8k times · Source

In my elastic search server I have one index http://localhost:9200/blog.
The (blog) index contains multiple types.

e.g.: http://localhost:9200/blog/posts, http://localhost:9200/blog/tags.

In the tags type I have created more than 1000 tags and 10 posts created in posts type.

e.g.: posts

{   
    "_index":"blog",
    "_type":"posts",
    "_id":"1",
    "_version":3,
    "found":true,
    "_source" : {
        "catalogId" : "1",
       "name" : "cricket",
       "url" : "http://www.wikipedia/cricket"
    }
}

e.g.: tags

{   
    "_index":"blog",
    "_type":"tags",
    "_id":"1",
    "_version":3,
    "found":true,
    "_source" : {
        "tagId" : "1",
        "name" : "game"
    }
}

I want to assign the existing tag to blog posts (i.e. relationship => mapping).

How do I assign the tags to posts mapping?

Answer

Paige Cook picture Paige Cook · May 1, 2014

There are 4 approaches that you can use within Elasticsearch for managing relationships. They are very well outlined in the Elasticsearch blog post - Managing Relations Inside Elasticsearch I would recommend reading the entire article to get more details on each approach and then select that approach that best meets your business needs while remaining technically appropriate.

Here are the highlights for the 4 approaches.

Inner Object

  • Easy, fast, performant
  • Only applicable when one-to-one relationships are maintained
  • No need for special queries

Nested

  • Nested docs are stored in the same Lucene block as each other, which helps read/query performance. Reading a nested doc is faster than the equivalent parent/child.
  • Updating a single field in a nested document (parent or nested children) forces ES to reindex the entire nested document. This can be very expensive for large nested docs
  • “Cross referencing” nested documents is impossible
  • Best suited for data that does not change frequently

Parent/Child

  • Children are stored separately from the parent, but are routed to the same shard. So parent/children are slightly less performance on read/query than nested
  • Parent/child mappings have a bit extra memory overhead, since ES maintains a “join” list in memory
  • Updating a child doc does not affect the parent or any other children, which can potentially save a lot of indexing on large docs
  • Sorting/scoring can be difficult with Parent/Child since the Has Child/Has Parent operations can be opaque at times

Denormalization

  • You get to manage all the relations yourself!
  • Most flexible, most administrative overhead
  • May be more or less performant depending on your setup