Elasticsearch match phrase prefix not matching all terms

Paul T Davies picture Paul T Davies · Nov 8, 2017 · Viewed 7.7k times · Source

I am having an issue where when I use the match_phrase_prefix query in Elasticsearch, it is not returning all the results I would expect it to, particularly when the query is one word followed by one letter.

Take this index mapping (this is a contrived example to protect sensitive data):

http://localhost:9200/test/drinks/_mapping

returns:

{
  "test": {
    "mappings": {
      "drinks": {
        "properties": {
          "name": {
            "type": "text"
          }
        }
      }
    }
  }
}

And amongst millions of other records are these:

{
    "_index": "test",
    "_type": "drinks",
    "_id": "2",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Black Label"
    }
},
{
    "_index": "test",
    "_type": "drinks",
    "_id": "1",
    "_score": 1,
    "_source": {
        "name": "Johnnie Walker Blue Label"
    }
}

The following query, which is one word followed by two letters:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker Bl"
        }
    }
}

returns this:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "test",
                "_type": "drinks",
                "_id": "2",
                "_score": 0.5753642,
                "_source": {
                    "name": "Johnnie Walker Black Label"
                }
           },
           {
               "_index": "test",
               "_type": "drinks",
               "_id": "1",
               "_score": 0.5753642,
               "_source": {
                   "name": "Johnnie Walker Blue Label"
                }
            }
        ]
    }
}

Whereas this query with one word and one letter:

POST http://localhost:9200/test/drinks/_search
{
    "query": {
        "match_phrase_prefix" : {
            "name" : "Walker B"
        }
    }
}

returns no results. What could be happening here?

Answer

Rlarroque picture Rlarroque · Nov 8, 2017

I will assume that you are working with Elasticsearch 5.0 and above. I think it might have to be because of the max_expansions default value.

As seen in the documentation here, the max_expansions parameters is used to control how many prefixes the last term will be expanded with. The default value is 50 and it might explain why you find "black" and "blue" with the two first letters B and L, but not with the B only.

The documentation is pretty clear about it:

The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which let’s you get started quickly with search-as-you-type but it’s results, which usually are good enough, can sometimes be confusing.

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

The problem is that the first 50 terms may not include the term fox so the phase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears

I wouldn't be able to tell you if it's ok to increase this parameter above 50 if you are looking for good performances since I never tried myself.