Elastic search- search_analyzer vs index_analyzer

Pavan K Mutt picture Pavan K Mutt · Apr 10, 2013 · Viewed 26.7k times · Source

I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.

I did not understand the part about having different search and index analyzers. The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:

"full_name":{
    "filter":[
        "standard",
        "lowercase",
        "asciifolding"
    ],
    "type":"custom",
    "tokenizer":"standard"
}

if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?

Please explain with an example where different analyzers are useful.

Answer

javanna picture javanna · Apr 10, 2013

You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.

The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.

For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):

"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"

Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:

"ela", "elas","elast","elasti" and "elastic"

That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:

"ela", "elap", "elaps", "elapse", "elapsed"

If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.

I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.