What is the difference between lemmatization vs stemming?

TIMEX picture TIMEX · Nov 24, 2009 · Viewed 83.2k times · Source

When do I use each ?

Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

Answer

miku picture miku · Nov 24, 2009

Short and dense: http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

From the NLTK docs:

Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.