Lucene indexing: Store and indexing modes explained

Boris Callens picture Boris Callens · Mar 16, 2009 · Viewed 12.7k times · Source

I think I'm still not understanding the lucene indexing options.

The following options are

  • Store.Yes
  • Store.No

and

  • Index.Tokenized
  • Index.Un_Tokenized
  • Index.No
  • Index.No_Norms

I don't really understand the store option. Why would you ever want to NOT store your field?
Tokenizing is splitting up the content and removing the noise words/separators (like "and", "or" etc)
I don't have a clue what norms could be. How are tokenized values stored?
What happens if i store a value "my string" in "fieldName"? Why doesn't a query

fieldName:my string

return anything?

Answer

dustyburwell picture dustyburwell · Mar 17, 2009

Store.Yes

Means that the value of the field will be stored in the index

Store.No

Means that the value of the field will NOT be stored in the index

Store.Yes/No does not affect the indexing or searching with lucene. It just tells lucene if you want it to act as a datastore for the values in the field. If you use Store.Yes, then when you search, the value of that field will be included in your search result Documents.

If you're storing your data in a database and only using the Lucene index for searching, then you can get away with Store.No on all of your fields. However, if you're using the index as storage as well, then you'll want Store.Yes.

Index.Tokenized

Means that the field will be tokenized when it's indexed (you got that one). This is useful for long fields with multiple words.

Index.Un_Tokenized

Means that the field will not be analyzed and will be stored as a single value. This is useful for keyword/single-word and some short multi-word fields.

Index.No

Exactly what it says. The field will not be indexed and therefore unsearchable. However, you can use Index.No along with Store.Yes to store a value that you don't want to be searchable.

Index.No_Norms

Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.

For further reading, the lucene javadocs are priceless (current API version 4.4.0):

For your last question, about why your query's not returning anything, without knowing anymore about how you're indexing that field, I'd say that it's because your fieldName qualifier is only attached to the 'my' string. To do the search for the phrase "my string" you want:

fieldName:"my string"

A search for both the words "my" and "string" in the fieldName field:

fieldName:(my string)