Lucene indexing: Store and indexing modes explained

Question 1

Lucene indexing: Store and indexing modes explained

lucene

Boris Callens · Mar 16, 2009 · Viewed 12.7k times · Source

Answer

Answer

Store.Yes

Means that the value of the field will be stored in the index

Store.No

Means that the value of the field will NOT be stored in the index

Store.Yes/No does not affect the indexing or searching with lucene. It just tells lucene if you want it to act as a datastore for the values in the field. If you use Store.Yes, then when you search, the value of that field will be included in your search result Documents.

If you're storing your data in a database and only using the Lucene index for searching, then you can get away with Store.No on all of your fields. However, if you're using the index as storage as well, then you'll want Store.Yes.

Index.Tokenized

Means that the field will be tokenized when it's indexed (you got that one). This is useful for long fields with multiple words.

Index.Un_Tokenized

Means that the field will not be analyzed and will be stored as a single value. This is useful for keyword/single-word and some short multi-word fields.

Index.No

Exactly what it says. The field will not be indexed and therefore unsearchable. However, you can use Index.No along with Store.Yes to store a value that you don't want to be searchable.

Index.No_Norms

Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.

For further reading, the lucene javadocs are priceless (current API version 4.4.0):

For your last question, about why your query's not returning anything, without knowing anymore about how you're indexing that field, I'd say that it's because your fieldName qualifier is only attached to the 'my' string. To do the search for the phrase "my string" you want:

fieldName:"my string"

A search for both the words "my" and "string" in the fieldName field:

fieldName:(my string)

Question 2

I think I'm still not understanding the lucene indexing options.

The following options are

Store.Yes
Store.No

and

Index.Tokenized
Index.Un_Tokenized
Index.No
Index.No_Norms

I don't really understand the store option. Why would you ever want to NOT store your field?
Tokenizing is splitting up the content and removing the noise words/separators (like "and", "or" etc)
I don't have a clue what norms could be. How are tokenized values stored?
What happens if i store a value "my string" in "fieldName"? Why doesn't a query

fieldName:my string

return anything?

Lucene indexing: Store and indexing modes explained

Answer

Store.Yes

Store.No

Index.Tokenized

Index.Un_Tokenized

Index.No

Index.No_Norms

Related questions