Treat child as field of parent in elastic search query

F21 picture F21 · Aug 4, 2012 · Viewed 15k times · Source

I am reading the docs for elasticsearch and this [page][1] talks about mapping a child to a parent type using _parent.

If I have childs called email attached to parents called account:

Fields in each type:

account (http://localhost:9200/myapp/account/1)
========
id
name
some_other_info
state

email (http://localhost:9200/myapp/email/1?parent=1)
========
id
email
  • How can I search on the name field of account and the email field of email provided that the state of account is active?

  • Is there a way to get all the children (of a certain type or of any type) a parent owns?

  • When indexing a child document, is it possible to to pass the parent as an object property in the JSON data as opposed to it being part of the query string?


After trying imotov's suggestion, I came up with this query:

This is executed on http://localhost:9200/myapp/account/_search

{
  "query": {
    "bool": {
      "must": [
        {
          "prefix": {
            "name": "a"
          }
        },
        {
          "term": {
            "statuses": "active"
          }
        }
      ],
      "should": [
        {
          "has_child": {
            "type": "emailaddress",
            "query": {
              "prefix": {
                "email": "a"
              }
            }
          }
        }
      ]
    }
  }
}

The problem is that the above does not give me any accounts where the email matches.

The effect I want is essentially this:

  • There is one search box
  • Users start typing and the search box autocompletes.
  • The user's query is checked against the name of the account or any of the emailaddress type.
  • If accounts were matched, just return them. If emailaddress were match, return its parent account.
  • Limit to a maximum of x (say 10) accounts for each search.

So, I basically need to be able to OR the search between 2 types and return the parent type of matches.


Test data:

curl -XPUT http://localhost:9200/test/account/1 -d '{
    "name": "John Smith",
    "statuses": "active"
}'

curl -XPUT http://localhost:9200/test/account/2 -d '{
    "name": "Peter Smith",
    "statuses": "active"
}'

curl -XPUT http://localhost:9200/test/account/3 -d '{
    "name": "Andy Smith",
    "statuses": "active"
}'

//Set up mapping for parent/child relationship

curl -XPUT 'http://localhost:9200/test/email/_mapping' -d '{
    "emails" : {
        "_parent" : {"type" : "account"}
    }
}'

curl -XPUT http://localhost:9200/test/email/1?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/2?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/3?parent=1 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/4?parent=2 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/5?parent=3 -d '{
    "email": "[email protected]"
}'

curl -XPUT http://localhost:9200/test/email/6?parent=3 -d '{
    "email": "[email protected]"
}'

imotov's solution worked for me. Another solution I have found is to query accounts for status = active, then run a bool filter on the result and use has_child on the child type and prefix on name inside the bool filter.

Answer

imotov picture imotov · Aug 5, 2012

An important difference between elasticsearch and relational databases is that elasticsearch cannot perform joins. In elasticsearch you are always searching a single index or union of indices. But in case of parent/child relationship, it's possible to limit results in the parent index using a query on the child index. For example, you can execute this query on the account type.

{
    "bool": {
        "must": [
            { 
                "text" : { "name": "foo" } 
            }, { 
                "term" : { "state": "active" } 
            }, {
                "has_child": {
                    "type": "email",
                    "query": {
                        "text": {"email": "bar" }
                    }
                }
            }
        ]
    }
}

This query will return you the parent document only (no child documents will be returned). You can use the parent id returned by this query to find all children of this parent using the field _parent, which is stored and indexed by default.

{
    "term" : { "_parent": "1" } 
}

Or you can limit your results only to the children that contain the word bar in the field email:

{
    "bool": {
        "must": [
            { 
                "term" : { "_parent": "1" } 
            }, { 
                "text" : { "email": "bar" } 
            }
        ]
    }
}

I don't think it's possible to specify parent in the json unless you are using _bulk indexing.

This is how email lookup can be implemented using test data provided in the question:

#!/bin/sh
curl -XDELETE 'http://localhost:9200/test' && echo 
curl -XPOST 'http://localhost:9200/test' -d '{
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 0
    },
    "mappings" : {
      "account" : {
        "_source" : { "enabled" : true },
        "properties" : {
          "name": { "type": "string", "analyzer": "standard" },
          "statuses": { "type": "string",  "index": "not_analyzed" }
        }
      },
      "email" : {
        "_parent" : {
          "type" : "account"
        },
        "properties" : {
          "email": { "type": "string",  "analyzer": "standard" }
        }
      }
    }
}' && echo

curl -XPUT 'http://localhost:9200/test/account/1' -d '{
    "name": "John Smith",
    "statuses": "active"
}'

curl -XPUT 'http://localhost:9200/test/account/2' -d '{
    "name": "Peter Smith",
    "statuses": "active"
}'

curl -XPUT 'http://localhost:9200/test/account/3' -d '{
    "name": "Andy Smith",
    "statuses": "active"
}'

//Set up mapping for parent/child relationship

curl -XPUT 'http://localhost:9200/test/email/1?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/2?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/3?parent=1' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/4?parent=2' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/5?parent=3' -d '{
    "email": "[email protected]"
}'

curl -XPUT 'http://localhost:9200/test/email/6?parent=3' -d '{
    "email": "[email protected]"
}'

curl -XPOST 'http://localhost:9200/test/_refresh'
echo
curl 'http://localhost:9200/test/account/_search' -d '{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "statuses": "active"
          }
        }
      ],
      "should": [
        {
          "prefix": {
            "name": "a"
          }
        },
        {
          "has_child": {
            "type": "email",
            "query": {
              "prefix": {
                "email": "a"
              }
            }
          }
        }
      ],
      "minimum_number_should_match" : 1
    }
  }
}' && echo