Lucene.Net: How can I add a date filter to my search results?

Chase Florell picture Chase Florell · Dec 30, 2010 · Viewed 7.4k times · Source

I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant.

I'm currently indexing like this
note: my example is in VB.NET, but I don't care if examples are given in C#

    Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex

        Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)

        Dim doc As Document = New Document

        doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
        doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
                                     "User" & searchableEvent.User.ID,
                                     searchableEvent.User.UserName),
                                 Field.Store.YES,
                                 Field.Index.TOKENIZED))
        doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))

        writer.AddDocument(doc)

        writer.Optimize()
        writer.Close()
        Return True

    End Function

Notice how I have a "date" index that stores the event date.

My search then looks like this

''# code omitted
        Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
        Dim searcher As IndexSearcher = New IndexSearcher(reader)
        Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
        Dim query As Query = parser.Parse(q.ToLower)

        ''# We're using 10,000 as the maximum number of results to return
        ''# because I have a feeling that we'll never reach that full amount
        ''# anyways.  And if we do, who in their right mind is going to page
        ''# through all of the results?
        Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
        Dim doc As Document = Nothing

        ''# loop through the topDocs and grab the appropriate 10 results based
        ''# on the submitted page number
        While i <= last AndAlso i < topDocs.totalHits
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
        End While
''# code omitted

I did try the following, but it was to no avail (threw a NullReferenceException).

        While i <= last AndAlso i < topDocs.totalHits
            If Date.Parse(doc.[Get]("date")) >= Date.Today Then
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
            End If
        End While

I also found the following documentation, but I can't make heads or tails of it
http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

Answer

sisve picture sisve · Dec 30, 2010

You're linking to the api documentation of Lucene 1.4.3. Lucene.Net is currently at 2.9.2. I think an upgrade is due.

First, you're using Store.Yes alot. Stored fields will make your index larger, which may be a performance issue. Your date problem can easily be solved by storing dates as strings in the format of "yyyyMMddHHmmssfff" (that's really high resolution, down to milliseconds). You may want to reduce the resolution to create fewer tokens to reduce your index size.

var dateValue = DateTools.DateToString(searchableEvent.EventDate, DateTools.Resolution.MILLISECOND);
doc.Add(new Field("date", dateValue, Field.Store.YES, Field.Index.NOT_ANALYZED));

Then you apply a filter to your search (the second parameter, where you currently pass in Nothing/null).

var dateValue = DateTools.DateToString(DateTime.Now, DateTools.Resolution.MILLISECOND);
var filter = FieldCacheRangeFilter.NewStringRange("date", 
                 lowerVal: dateValue, includeLower: true, 
                 upperVal: null, includeUpper: false);
var topDocs = searcher.Search(query, filter, 10000);

You can do this using a BooleanQuery combining your normal query with a RangeQuery, but that would also affect scoring (which is calculated on the query, not the filter). You may also want to avoid modifying the query for simplicity, so you know what query is executed.