What is the best stemming method in Python?

Question 1

What is the best stemming method in Python?

python nltk stemming

PeYoTlL · Jul 9, 2014 · Viewed 56k times · Source

Answer

Answer

The results you are getting are (generally) expected for a stemmer in English. You say you tried "all the nltk methods" but when I try your examples, that doesn't seem to be the case.

Here are some examples using the PorterStemmer

import nltk
ps = nltk.stemmer.PorterStemmer()
ps.stem('grows')
'grow'
ps.stem('leaves')
'leav'
ps.stem('fairly')
'fairli'

The results are 'grow', 'leav' and 'fairli' which, even if they are what you wanted, are stemmed versions of the original word.

If we switch to the Snowball stemmer, we have to provide the language as a parameter.

import nltk
sno = nltk.stem.SnowballStemmer('english')
sno.stem('grows')
'grow'
sno.stem('leaves')
'leav'
sno.stem('fairly')
'fair'

The results are as before for 'grows' and 'leaves' but 'fairly' is stemmed to 'fair'

So in both cases (and there are more than two stemmers available in nltk), words that you say are not stemmed, in fact, are. The LancasterStemmer will return 'easy' when provided with 'easily' or 'easy' as input.

Maybe you really wanted a lemmatizer? That would return 'article' and 'poodle' unchanged.

import nltk
lemma = nltk.wordnet.WordNetLemmatizer()
lemma.lemmatize('article')
'article'
lemma.lemmatize('leaves')
'leaf'

Question 2

I tried all the nltk methods for stemming but it gives me weird results with some words.

Examples

It often cut end of words when it shouldn't do it :

poodle => poodl
article articl

or doesn't stem very good :

easily and easy are not stemmed in the same word
leaves, grows, fairly are not stemmed

Do you know other stemming libs in python, or a good dictionary?

Thank you

What is the best stemming method in Python?

Answer

Related questions