I tried all the nltk methods for stemming but it gives me weird results with some words.
Examples
It often cut end of words when it shouldn't do it :
or doesn't stem very good :
Do you know other stemming libs in python, or a good dictionary?
Thank you
The results you are getting are (generally) expected for a stemmer in English. You say you tried "all the nltk methods" but when I try your examples, that doesn't seem to be the case.
Here are some examples using the PorterStemmer
import nltk
ps = nltk.stemmer.PorterStemmer()
ps.stem('grows')
'grow'
ps.stem('leaves')
'leav'
ps.stem('fairly')
'fairli'
The results are 'grow', 'leav' and 'fairli' which, even if they are what you wanted, are stemmed versions of the original word.
If we switch to the Snowball stemmer, we have to provide the language as a parameter.
import nltk
sno = nltk.stem.SnowballStemmer('english')
sno.stem('grows')
'grow'
sno.stem('leaves')
'leav'
sno.stem('fairly')
'fair'
The results are as before for 'grows' and 'leaves' but 'fairly' is stemmed to 'fair'
So in both cases (and there are more than two stemmers available in nltk), words that you say are not stemmed, in fact, are. The LancasterStemmer will return 'easy' when provided with 'easily' or 'easy' as input.
Maybe you really wanted a lemmatizer? That would return 'article' and 'poodle' unchanged.
import nltk
lemma = nltk.wordnet.WordNetLemmatizer()
lemma.lemmatize('article')
'article'
lemma.lemmatize('leaves')
'leaf'