I am having trouble searching for an exact phrase using Lucene.NET 2.0.0.4
For example I am searching for "scope attribute sets the variable" (including quotes) but receive no matches, I have confirmed 100% that the phrase exists.
Can anyone suggest where I am going wrong? Is this even supported with Lucene.NET? As usual the API documentation is not too helpful and a few CodeProject articles I've read don't specifically touch on this.
Using the following code to create the index:
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", true);
Analyzer analyzer = new Lucene.Net.Analysis.SimpleAnalyzer();
IndexWriter indexWriter = new Lucene.Net.Index.IndexWriter(dir, analyzer,true);
//create a document, add in a single field
Lucene.Net.Documents.Document doc = new Lucene.Net.Documents.Document();
Lucene.Net.Documents.Field fldContent = new Lucene.Net.Documents.Field(
"content", File.ReadAllText(@"Documents\100.txt"),
Lucene.Net.Documents.Field.Store.YES,
Lucene.Net.Documents.Field.Index.TOKENIZED);
doc.Add(fldContent);
//write the document to the index
indexWriter.AddDocument(doc);
I then search for a phrase using:
//state the file location of the index
Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory("Index", false);
//create an index searcher that will perform the search
IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir);
QueryParser qp = new QueryParser("content", new SimpleAnalyzer());
// txtSearch.Text Contains a phrase such as "this is a phrase"
Query q=qp.Parse(txtSearch.Text);
//execute the query
Lucene.Net.Search.Hits hits = searcher.Search(q);
The target document is about 7 MB plain text.
I have seen this previous question however I don't want a proximity search, just an exact phrase search.
Shashikant Kore is correct with his answer, you need to enable term positions...
However, I would recommend not storing the text of the document in the field unless you absolutely need it to return back to you in the search results... Setting the store to 'NO' might help reduce the size of your index a bit.
Lucene.Net.Documents.Field fldContent =
new Lucene.Net.Documents.Field("content",
File.ReadAllText(@"Documents\100.txt"),
Lucene.Net.Documents.Field.Store.NO,
Lucene.Net.Documents.Field.Index.TOKENIZED,
Lucene.Net.Documents.Field.TermVector.WITH_POSITIONS_OFFSETS);