Tuesday, October 22, 2019

Stemming and Lemmatization

Stemming matches words of different pluralization and tenses. This is tricky because you cannot just add or remove an s on the end of an English language word for a universal win for pluralization. The words mouse and mice won't play well together in that circumstance. Lemmatization is stemming and then some with matching across Thesaurus alignments for similarities or outright synonyms. Microsoft's FAST (Fast Search & Transfer) search engine uses lemmatization.

No comments:

Post a Comment