What languages does Cludo support?
The natural language processing at Cludo consists of multiple steps:
- Tokenization – Splitting a sentence into individual words
- Elision – Removing elisions; For example, in French: l’amour → amour, m’appelle → appelle
- Stop words – Remove fill words such as a, an, it, is, that, this, me, you, your, etc., as they don’t provide any context to the content
- Stemming – Convert words into their root form, e.g. pilots→pilot, grew→grow, living→live (supporting derivations)
Supported Languages
Language | ISO code | Tokenization | Elision | Stop words | Stemming |
---|---|---|---|---|---|
Arabic | ar | ✅ | ✅ | ✅ | ✅ |
Armenian | hy | ✅ | ✅ | ✅ | ✅ |
Basque | eu | ✅ | ✅ | ✅ | ✅ |
Brazilian | pt-br | ✅ | ✅ | ✅ | ✅ |
Bulgarian | bg | ✅ | ✅ | ✅ | ✅ |
Catalan | ca | ✅ | ✅ | ✅ | ✅ |
Chinese (Simplified) | zh | ✅ | ✅ | ✅ | ✅ |
Czech | cs | ✅ | ✅ | ✅ | ✅ |
Danish | da | ✅ | ✅ | ✅ | ✅ |
Dutch | nl | ✅ | ✅ | ✅ | ✅ |
English | en | ✅ | ✅ | ✅ | ✅ |
Estonian | et | ✅ | ❌ | ❌ | ❌ |
Finnish | fi | ✅ | ✅ | ✅ | ✅ |
French | fr | ✅ | ✅ | ✅ | ✅ |
Galician | gl | ✅ | ✅ | ✅ | ✅ |
German | de | ✅ | ✅ | ✅ | ✅ |
Greek | el | ✅ | ✅ | ✅ | ✅ |
Hindi | hi | ✅ | ✅ | ✅ | ✅ |
Hungarian | hu | ✅ | ✅ | ✅ | ✅ |
Icelandic | is | ✅ | ❌ | ❌ | ❌ |
Indonesian | id | ✅ | ✅ | ✅ | ✅ |
Irish | ga | ✅ | ✅ | ✅ | ✅ |
Italian | it | ✅ | ✅ | ✅ | ✅ |
Japanese | jp | ✅ | ✅ | ✅ | ✅ |
Korean | ko | ✅ | ❌ | ❌ | ❌ |
Latvian | lv | ✅ | ✅ | ✅ | ✅ |
Lithuanian | lt | ✅ | ✅ | ✅ | ✅ |
Norwegian (bokmål) | no | ✅ | ✅ | ✅ | ✅ |
Norwegian (nynorsk) | nn | ✅ | ✅ | ✅ | ✅ |
Persian | fa | ✅ | ✅ | ✅ | ❌ |
Polish | pl | ✅ | ✅ | ✅ | ✅ |
Portuguese | pt | ✅ | ✅ | ✅ | ✅ |
Romanian | ro | ✅ | ✅ | ✅ | ✅ |
Russian | ru | ✅ | ✅ | ✅ | ✅ |
Serbian | sr | ✅ | ❌ | ❌ | ❌ |
Sorani (Kurdish) | ku | ✅ | ✅ | ✅ | ✅ |
Spanish | es | ✅ | ✅ | ✅ | ✅ |
Swahili | sw | ✅ | ❌ | ❌ | ❌ |
Swedish | sv | ✅ | ✅ | ✅ | ✅ |
Thai | th | ✅ | ✅ | ✅ | ❌ |
Turkish | tr | ✅ | ✅ | ✅ | ✅ |
Ukrianian | uk | ✅ | ✅ | ✅ | ✅ |
Vietnamese | vi | ✅ | ❌ | ❌ | ❌ |