I'm using lucene on a site of mine and I want to show the total result count from a query, for example:
Showing results x to y of z
But I can't find any method which will return me the total number of potential results. I can only seem to find methods which you have to specify the number of results you want, and since I only want 10 per page it seems logical to pass in 10 as the number of results.
Or am I doing this wrong, should I be passing in say 1000 and then just taking the 10 in the range that I require?
BTW, since I know you personally I should point out for others I already knew you were referring to Lucene.net and not Lucene :) although the API would be the same
In versions prior to 2.9.x you could call IndexSearcher.Search(Query query, Filter filter)
which returned a Hits
object, one of which properties [methods, technically, due to the Java port] was Length()
This is now marked Obsolete since it will be removed in 3.0, the only results of Search return TopDocs
or TopFieldDocs
objects.
Your alternatives are
a) Use IndexServer.Search(Query query, int count)
which will return a TopDocs
object, so TopDocs.TotalHits
will show you the total possible hits but at the expense of actually creating <count>
results
b) A faster way is to implement your own Collector
object (inherit from Lucene.Net.Search.Collector
) and call IndexSearcher.Search(Query query, Collector collector)
. The search method will call Collect(int docId)
on your collector on every match, so if internally you keep track of that you have a way of garnering all the results.
It should be noted Lucene is not a total-resultset query environment and is designed to stream the most relevant results to you (the developer) as fast as possible. Any method which gives you a "total results" count is just a wrapper enumerating over all the matches (as with the Collector method).
The trick is to keep this enumeration as fast as possible. The most expensive part is deserialisation of Documents from the index, populating each field etc. At least with the newer API design, requiring you to write your own Collector, the principle is made clear by telling the developer to avoid deserialising each result from the index since only matching document Ids and a score are provided by default.