Too many boolean clauses exception in solr

Ankit Ostwal picture Ankit Ostwal · Jun 3, 2013 · Viewed 9.4k times · Source

I am facing these problem while using OR , logical operator in framing query. I dont want to increase the maxBooleanClause value. Is there any other option than this. My OR range can go upto like 2 millions.I would rather want that if range of maxBooleanClause is exceeded than solr splits up the query, & finally merge all the subqueries. Is something of these sort possible? Or if any of you can suggest some better technique to do this.

I want to plot a graph where user provide some range of dates for e.g. between 2013-03-01 to 2013-06-01 gives all the visitors visiting the app. Here i want to make a query which is OR of all unique id's.For e.g.

      uniqueId:(1001 OR 1003 OR 1009 OR ........ OR 102467)

Help is appreciated.

Answer

Nick Zadrozny picture Nick Zadrozny · Jun 3, 2013

Solr imposes a maxBooleanClause precisely because this is the kind of thing that is outside of its sweet spot. Ultimately, if you need millions of searches, then you will need to do your own distribution and aggregation outside of Solr.

I am going to go out on a limb and guess that these clauses are graph related, which is the most common place I see these kinds of queries. In that case, it may be possible for you to stay somewhat inside Solr's strengths here.

Sometimes it makes sense to invert the logic of your filter, and instead of passing in a large set of values to filter by, index those values onto the documents you are searching so you can pass a single value later.

For example, say you have an index of people. And say you want to search for people who are friends with some specific person. You could generate the list of IDs of all their friends in order to filter your search. But then you'll have a similar problem to what you're seeing here: lots and lots of OR clauses.

Alternatively, you can index each person's list of friends into Solr. Now you'll have a field with thousands of values in it, but your query filter will have only one value: the ID of the person whose network you are filtering the search by.

This plays more toward Solr's strengths as far as the mechanics of searching are concerned. However, there is a cost. You'll need to manage the denormalization yourself, and probably be making a lot of updates to your documents, or suffering some latency in updates to your graph.

If that proves too onerous, you may need to consider a different technology better optimized for graph traversal.