I am trying to write a NEST query that should return results based on exact string match. I have researched on web and there are suggestions about using Term, Match, MatchPhrase. I have tried all those but my searches are returning results that contains part of search string. For example, In my database i have following rows of email addresses:
Irrespective of whether i use:
client.Search<Emails>(s => s.From(0)
.Size(MaximumSearchResultsSize)
.Query(q => q.Term( p=> p.OnField(fielname).Value(fieldValue))))
or
client.Search<Emails>(s => s.From(0).
Size(MaximumPaymentSearchResults).
Query(q=>q.Match(p=>p.OnField(fieldName).Query(fieldValue))));
My search results are always returning rows containing "partial search" string.
So, if i provide the search string as "ter", I am still getting all the 3 rows. [email protected]
I expect to see no rows returned if the search string is "ter".If the search string is "[email protected]" then i would like to see only "[email protected]".
Not sure what am i doing wrong.
Based on the information you have provided in the question, it sounds like the field that contains the email address has been indexed with the Standard Analyzer, the default analyzer applied to string fields if no other analyzer has been specified or the field is not marked as not_analyzed
.
The implications of the standard analyzer on a given string input can be seen by using the Analyze API of Elasticsearch:
curl -XPOST "http://localhost:9200/_analyze?analyzer=standard&text=ter%40gmail.com
The text input needs to be url encoded, as demonstrated here with the @ symbol. The results of running this query are
{
"tokens": [
{
"token": "ter",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "gmail.com",
"start_offset": 4,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 2
}
]
}
We can see that the standard analyzer produces two tokens for the input, ter
and gmail.com
, and this is what will be stored in the inverted index for the field.
Now, running a Match query will cause the input to the match query to be analyzed, by default using the same analyzer as the one found in the mapping definition for the field on which the match query is being applied.
The resulting tokens from this match query analysis are then combined by default into a boolean or query such that any document that contains any one of the tokens in inverted index for the field will be a match. For the example
text [email protected]
, this would mean any documents that have a match for ter
or gmail.com
for the field would be a hit
// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index
// Querying
input: [email protected] -> match query -> docs with ter or gmail.com are a hit!
Clearly, for an exact match, this is not what we intend at all!
Running a Term query will cause the input to the term query to not be analyzed i.e. it's a query for an exact match to the term input, but running this on a field that has been analyzed at index time could potentially be a problem; since the value for the field has undergone analysis but the input to the term query has not, you are going to get results returned that exactly match the term input as a result of the analysis that happened at index time. For example
// Indexing
input: [email protected] -> standard analyzer -> ter,gmail.com in inverted index
// Querying
input: [email protected] -> term query -> No exact matches for [email protected]
input: ter -> term query -> docs with ter in inverted index are a hit!
This is not what we want either!
What we probably want to do with this field is set it to be not_analyzed
in the mapping definition
putMappingDescriptor
.MapFromAttributes()
.Properties(p => p
.String(s => s.Name(n => n.FieldName).Index(FieldIndexOption.NotAnalyzed)
);
With this in place, we can search for exact matches with a Term filter using a Filtered query
// change dynamic to your type
var docs = client.Search<dynamic>(b => b
.Query(q => q
.Filtered(fq => fq
.Filter(f => f
.Term("fieldName", "[email protected]")
)
)
)
);
which will produce the following query DSL
{
"query": {
"filtered": {
"filter": {
"term": {
"fieldName": "[email protected]"
}
}
}
}
}