I'm using Lucene.net, but I am tagging this question for both .NET and Java versions because the API is the same and I'm hoping there are solutions on both platforms.
I'm sure other people have addressed this issue, but I haven't been able to find any good discussions or examples.
By default, Lucene is very picky about query syntax. For example, I just got the following error:
[ParseException: Cannot parse 'hi there!': Encountered "<EOF>" at line 1, column 9.
Was expecting one of:
"(" ...
"*" ...
<QUOTED> ...
<TERM> ...
<PREFIXTERM> ...
<WILDTERM> ...
"[" ...
"{" ...
<NUMBER> ...
]
Lucene.Net.QueryParsers.QueryParser.Parse(String query) +239
What is the best way to prevent ParseExceptions when processing queries from users? It seems to me that the most usable search interface is one that always executes a query, even if it might be the wrong query.
It seems that there are a few possible, and complementary, strategies:
I don't really have any great ideas about how to do any of those strategies. Has anyone else addressed this issue? Are there any "simple" or "graceful" parsers that I don't know about?
Yo can make Lucene ignore the special characters by sanitizing the query with something like
query = QueryParser.Escape(query)
If you do not want your users to ever use advanced syntax in their queries, you can do this always.
If you want your users to use advanced syntax but you also want to be more forgiving with the mistakes you should only sanitize after a ParseException has occured.