Word frequency algorithm for natural language processing
While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP.
I thought of SOLR because:
Although the above reasons are good, I don't know SOLR THAT well, so I need to know if it would be appropriate for my requirements.
Ideally, I'd like to configure SOLR, and then be able to send SOLR some text, and retrieve the indexed tonkenized content.
I'm working on a small component of a bigger recommendation engine.
I guess you can use Solr and combine it with other tools. Tokenization, stop word removal, stemming, and even synonyms come out of the box with Solr. If you need named entity recognition or base noun-phrase extraction, you need to use OpenNLP or an equivalent tool as a pre-processing stage. You will probably need term vectors for your retrieval purposes. Integrating Apache Mahout with Apache Lucene and Solr may be useful as it discusses Lucene and Solr integration with a machine learning (including recommendation) engine. Other then that, feel free to ask further more specific questions.