Need a way for my search engine to handle small typos in search strings and still return the right results.
According to the ElasticSearch docs, there are three values that are relevant to fuzzy matching in text queries: fuzziness, max_expansions, and prefix_length.
Unfortunately, there is not a lot of detail available on exactly what these parameters do, and what sane values for them are. I do know that fuzziness is supposed to be a float between 0 and 1.0, and the other two are integers.
Can anyone recommend reasonable "starting point" values for these parameters? I'm sure I will have to tune by trial and error, but I'm just looking for ballpark values to correctly handle typos and misspellings.
I found it helpful when using the fuzzy query to actually use both a term query and a fuzzy query(with the same term) in order to both retrieve results for typos, but also ensure that instances of the entered search word appeared highest in the results.
I.E.
{
"query": {
"bool": {
"should": [
{
"match": {
"_all": search_term
}
},
{
"match": {
"_all": {
"query": search_term,
"fuzziness": "1",
"prefix_length": 2
}
}
}
]
}
}
}
a few more details listed here: https://medium.com/@wampum/fuzzy-queries-ae47b66b325c