How to configure Solr to do partial word matching

Question 1

How to configure Solr to do partial word matching

solr lucene sunspot

Andrew Hubbs · Feb 26, 2015 · Viewed 10k times · Source

Answer

Answer

I think I figured it out. I definitely welcome other answers and additional corrections though.

The solution appears to be to only use the EdgeNGramFilterFactory when indexing, not when querying. This makes sense when you think about it. I want n-grams when indexing but only want to match the actual search term when querying.

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

Question 2

Given the following set of values how do I configure the field to return values that are partial word matches but that also match the entire search term?

Values:

Texas State University
Stanford University
St. Johns College

Desired results examples:

Search Term: sta

Desired Results:

Texas State University
Stanford University

Search Term: stan

Desired Results:

Stanford University

Search Term: st un

Desired Results:

Texas State University
Stanford University

This is what I've tried so far:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.PorterStemFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
  </analyzer>
</fieldType>

I think my problem is with the EdgeNGramFilterFactory. As shown above, the second search for stan returns all three of the values shown instead of only Stanford. But, without the EdgeNGramFilterFactory, partial words don't match at all.

What is the correct configuration for a Solr field to return values that are partial word matches but that also match the entire search term?

How to configure Solr to do partial word matching

Desired results examples:

This is what I've tried so far:

Answer

Related questions