I'm currently integrating Apache Solr searching into my platform and using the Suggester functionality for autocompletion. However, the Suggester module does not return spelling suggestions as well, so for example if I search for:
shi
The suggester module returns among others the following:
shirt
shirts
However, if I search for:
shrt
No suggestions are returned. What I'd like to know is:
a) Is it incorrect configuration of the Suggester module that has resulted in this? b) Is the Suggester module built in such a way that it does not return spelling suggestions? c) How can I get the Suggester module to return spelling suggestions as well without having to make a second request for spelling correction suggestions?
I have read the Solr documentation but cannot seem to make a headway with this.
You need to configure a spell check component to generate alternate spelling options as described at https://lucene.apache.org/solr/guide/8_1/spell-checking.html
The task consists of following steps:
First, update the schema.xml
with a spellcheck field. This often means creating a new field and copying multiple fields to a single spellcheck
field:
<field name="spellcheck" type="text_general"
indexed="true"
stored="false"
multiValued="true"/>
<copyField source="id" dest="spellcheck"/>
<copyField source="name" dest="spellcheck"/>
<copyField source="description" dest="spellcheck"/>
<copyField source="longdescription" dest="spellcheck"/>
<copyField source="category" dest="spellcheck"/>
<copyField source="source" dest="spellcheck"/>
<copyField source="merchant" dest="spellcheck"/>
<copyField source="contact" dest="spellcheck"/>
In solrconfig.xml
update your request handler and create a solr.SpellCheckComponent
and add it to your search handler.
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<!-- decide between dictionary based vs index based spelling suggestions,
in most cases it makes sense to use index based spell checker
as it only generates terms which are
actually present in your search corpus -->
<str name="classname">solr.IndexBasedSpellChecker</str>
<!-- field to use -->
<str name="field">spellcheck</str>
<!-- buildOnCommit|buildOnOptimize -->
<str name="buildOnCommit">true</str>
<!-- $solr.solr.home/data/spellchecker-->
<str name="spellcheckIndexDir">./spellchecker</str>
<str name="accuracy">0.7</str>
<float name="thresholdTokenFrequency">.0001</float>
</lst>
</searchComponent>
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">defaultSearchField</str>
<!-- spell check component configuration -->
<str name="spellcheck">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.maxCollationTries">5</str>
</lst>
<!-- add spell check processing after
the default search component. This is
the search component name. -->
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Reindex the corpus
Test suggestions are working. For example,
http://localhost:8983/solr/select/?q=coachin
{
"responseHeader": {
"status": 0,
"QTime": 12,
"params": {
"indent": "true",
"q": "coachin"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"spellcheck": {
"suggestions": [
"coachin", {
"numFound": 1,
"startOffset": 0,
"endOffset": 7,
"suggestion": ["cochin"]
}
]
}
}