Hailay
/

geez-tokenizer

Token Classification

byte-pair-encoding

morphology-aware

Model card Files Files and versions

Hailay commited on Jul 15, 2025

Commit

98aeb25

·

verified ·

1 Parent(s): 725597f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ model-index:
 # Geez Tokenizer (`Hailay/geez-tokenizer`)
-A **BPE tokenizer** specifically trained for **Geez-script languages**, including **Tigrinya** and **Amharic**. The tokenizer is trained on monolingual corpora derived from the [HornMT](https://github.com/HornMT) project and supports morphologically rich low-resource languages.
 ## 🧠 Motivation

 # Geez Tokenizer (`Hailay/geez-tokenizer`)
+A **BPE tokenizer** specifically trained for **Geez-script languages**, including **Tigrinya** and **Amharic**. The tokenizer is trained on monolingual corpora  and supports morphologically rich low-resource languages.
 ## 🧠 Motivation