java - tf*idf implementation?

Aravind Chinta picture Aravind Chinta · Apr 18, 2012 · Viewed 13k times · Source

I am basically creating a search engine and I want to implement tf*idf to rank my xml documents based on a search query. How do I implement it? How do I start it? Any help appreciated.

Answer

daveb picture daveb · Apr 18, 2012

I did this in the past, and I used Lucene to get the TD*IDF data.

It took fair amount of fiddling aound though, so if there are other solutions people know are easier, then use them.

Start by looking at TermFreqVector and other classes in org.apache.lucene.index.