Lemmatization is crucial to many performant natural language processing services. A word’s lemma is it’s dictionary form. And so the process of lemmatization involves standardizing all inflections of a given word into it’s base. Stemming, a related task involves simply removing the suffixes of a given word. Lemmatization on the other hand relies on discerning the meaning of a on lemmatized word in the context it is mentioned to surface the proper lemma.
While lemmatization tends to support more accurate natural language understanding, lemmatization is typically harder to implement and more resource intensive (slower) to run than stemming. This is because additional computing power is used to determine the part of speech of a given word being lemmatized.
Stemming versus lemmatization
Input | Lemmatization | Stemming |
Caring | Care | Car |
Stemming | Stem | Stemm |
Swimming | Swim | Swimm |