Related questions
Python: tf-idf-cosine: to find document similarity
I was following a tutorial which was available at Part 1 & Part 2. Unfortunately the author didn't have the time for the final section which involved using cosine similarity to actually find the distance between two documents. I followed the examples …
Python's NLTK vs. related Java Libraries?
I've used LingPipe, Stanford's NER, RiTa and various sentence similarity libraries for my previous Java projects that focused on text (pre)processing (indexing, xml tagging, topic detection, etc.) of large amounts of English text (around 10,000 documents summing to > 1gb …
Lemmatization of non-English words?
I would like to apply lemmatization to reduce the inflectional forms of words. I know that for English language WordNet provides such a functionality, but I am also interested in applying lemmatization for Dutch, French, Spanish and Italian words. Is …