Retrieve analyzed tokens from ElasticSearch documents

Clay Wardell picture Clay Wardell · Nov 15, 2012 · Viewed 22.6k times · Source

Trying to access the analyzed/tokenized text in my ElasticSearch documents.

I know you can use the Analyze API to analyze arbitrary text according your analysis modules. So I could copy and paste data from my documents into the Analyze API to see how it was tokenized.

This seems unnecessarily time consuming, though. Is there any way to instruct ElasticSearch to returned the tokenized text in search results? I've looked through the docs and haven't found anything.

Answer

Torsten Engelbrecht picture Torsten Engelbrecht · Jun 24, 2014

This question is a litte old, but maybe I think an additional answer is necessary.

With ElasticSearch 1.0.0 the Term Vector API was added which gives you direct access to the tokens ElasticSearch stores under the hood on per document basis. The API docs are not very clear on this (only mentioned in the example), but in order to use the API you have to first indicate in your mapping definition that you want to store term vectors with the term_vector property on each field.