Token Classification
Transformers
Amharic
Tigrinya
tokenizer
byte-pair-encoding
bpe
geez-script
amharic
tigrinya
low-resource
nlp
morphology-aware
Horn of Africa
Instructions to use Hailay/geez-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Hailay/geez-tokenizer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="Hailay/geez-tokenizer")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Hailay/geez-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -27,7 +27,7 @@ model-index:
|
|
| 27 |
|
| 28 |
# Geez Tokenizer (`Hailay/geez-tokenizer`)
|
| 29 |
|
| 30 |
-
A **BPE tokenizer** specifically trained for **Geez-script languages**, including **Tigrinya** and **Amharic**. The tokenizer is trained on monolingual corpora
|
| 31 |
|
| 32 |
## 🧠 Motivation
|
| 33 |
|
|
|
|
| 27 |
|
| 28 |
# Geez Tokenizer (`Hailay/geez-tokenizer`)
|
| 29 |
|
| 30 |
+
A **BPE tokenizer** specifically trained for **Geez-script languages**, including **Tigrinya** and **Amharic**. The tokenizer is trained on monolingual corpora and supports morphologically rich low-resource languages.
|
| 31 |
|
| 32 |
## 🧠 Motivation
|
| 33 |
|