How to make the Lucene QueryParser more forgiving?

Winston Fassett picture Winston Fassett · Nov 4, 2008 · Viewed 16.8k times · Source

I'm using Lucene.net, but I am tagging this question for both .NET and Java versions because the API is the same and I'm hoping there are solutions on both platforms.

I'm sure other people have addressed this issue, but I haven't been able to find any good discussions or examples.

By default, Lucene is very picky about query syntax. For example, I just got the following error:

[ParseException: Cannot parse 'hi there!': Encountered "<EOF>" at line 1, column 9.
Was expecting one of:
    "(" ...
    "*" ...
    <QUOTED> ...
    <TERM> ...
    <PREFIXTERM> ...
    <WILDTERM> ...
    "[" ...
    "{" ...
    <NUMBER> ...
    ]
   Lucene.Net.QueryParsers.QueryParser.Parse(String query) +239

What is the best way to prevent ParseExceptions when processing queries from users? It seems to me that the most usable search interface is one that always executes a query, even if it might be the wrong query.

It seems that there are a few possible, and complementary, strategies:

  • "Clean" the query prior to sending it to the QueryProcessor
  • Handle exceptions gracefully
    • Show an intelligent error message to the user
    • Perhaps execute a simpler query, leaving off the erroneous bit

I don't really have any great ideas about how to do any of those strategies. Has anyone else addressed this issue? Are there any "simple" or "graceful" parsers that I don't know about?

Answer

ljorquera picture ljorquera · Nov 5, 2008

Yo can make Lucene ignore the special characters by sanitizing the query with something like

query = QueryParser.Escape(query)

If you do not want your users to ever use advanced syntax in their queries, you can do this always.

If you want your users to use advanced syntax but you also want to be more forgiving with the mistakes you should only sanitize after a ParseException has occured.