Does Cludo support lemmatization?


Lemmatization is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word’s lemma, or dictionary form. Unlike stemming, lemmatization depends on correctly identifying the intended part of speech and meaning of a word in a sentence, as well as within the larger context surrounding that sentence, such as neighboring sentences or even an entire document.

Cludo does not support lemmatization. However, the following are Natural Language Processing features are supported – which should satisfy far most cases:

  • Tokenization – Splitting a sentence into individual words
  • Elision – Removing elisions; For example, in French: l’amour → amour, m’appelle → appelle
  • Stop words – Remove fill words such as a, an, it, is, that, this, me, you, your, as they don’t provide any context of content
  • Stemming – Convert words into their root form like pilots→pilot, grew→grow, running→run