Analyzers in elasticsearch

Vamsi Krishna picture Vamsi Krishna · Oct 11, 2012 · Viewed 26.4k times · Source

I'm having trouble understanding the concept of analyzers in elasticsearch with tire gem. I'm actually a newbie to these search concepts. Can someone here help me with some reference article or explain what actually the analyzers do and why they are used?

I see different analyzers being mentioned at elasticsearch like keyword, standard, simple, snowball. Without the knowledge of analyzers I couldn't make out what actually fits my need.

Answer

dadoonet picture dadoonet · Oct 11, 2012

Let me give you a short answer.

An analyzer is used at index Time and at search Time. It's used to create an index of terms.

To index a phrase, it could be useful to break it in words. Here comes the analyzer.

It applies tokenizers and token filters. A tokenizer could be a Whitespace tokenizer. It split a phrase in tokens at each space. A lowercase tokenizer will split a phrase at each non-letter and lowercase all letters.

A token filter is used to filter or convert some tokens. For example, a ASCII folding filter will convert characters like ê, é, è to e.

An analyzer is a mix of all of that.

You should read Analysis guide and look at the right all different options you have.

By default, Elasticsearch applies the standard analyzer. It will remove all common english words (and many other filters)

You can also use the Analyze Api to understand how it works. Very useful.