Using multiple tokenizers in Solr

Matt Dell picture Matt Dell · Aug 5, 2010 · Viewed 9.3k times · Source

What I want to be able to do is perform a query and get results back that are not case sensitive and that match partial words from the index.

I have a Solr schema set up at the moment that has been modified so that I can query and return results no matter what case they are. So, if I search for iPOd, Iwill see iPod returned. The code to do this is:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
...
</fieldType>

I have found this code that will allow us to do a partial word match query, but I don't think I can have two tokenizers on one field.

<fieldType name="text" class="solr.TextField" >
  <analyzer type="index">
    <tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
...
</fieldType>

So what can I do to perform this tokenizer on the field as well?
Or is there a way to merge them?
Or is there another way I can accomplish this task?

Answer

Mauricio Scheffer picture Mauricio Scheffer · Aug 5, 2010

Declare another fieldType (i.e. a different name) that has the NGram tokenizer, then declare a field that uses the fieldType with NGram and another field with the standard "text" fieldType. Use copyField to copy one to another. See Indexing same data in multiple fields.