Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,22 +11,60 @@ license: apache-2.0
|
|
| 11 |
|
| 12 |
# Kabyle POS Tagger Demo
|
| 13 |
|
| 14 |
-
Interactive demo for the [boffire/kabyle-pos](https://huggingface.co/boffire/kabyle-pos) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
- **Base:** XLM-RoBERTa-base
|
| 18 |
- **Task:** Token Classification (POS tagging)
|
| 19 |
-
- **Test F1:**
|
|
|
|
| 20 |
- **Tagset:** Universal Dependencies (17 tags)
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Usage
|
| 23 |
Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Limitations
|
| 26 |
-
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
- No diacritic normalization
|
| 30 |
|
| 31 |
## Citation
|
| 32 |
-
Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
# Kabyle POS Tagger Demo
|
| 13 |
|
| 14 |
+
Interactive demo for the [boffire/kabyle-pos-v2](https://huggingface.co/boffire/kabyle-pos-v2) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.
|
| 15 |
|
| 16 |
## Model Details
|
| 17 |
- **Base:** XLM-RoBERTa-base
|
| 18 |
- **Task:** Token Classification (POS tagging)
|
| 19 |
+
- **Test F1:** 93.8%
|
| 20 |
+
- **Training Data:** 10,000 sentences with a 214,000-entry lexicon
|
| 21 |
- **Tagset:** Universal Dependencies (17 tags)
|
| 22 |
|
| 23 |
+
## Features
|
| 24 |
+
- **Punctuation-aware tokenization:** Attached punctuation (e.g., `medden.`) is automatically split and tagged as `PUNCT`.
|
| 25 |
+
- **Clitic handling:** Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
|
| 26 |
+
- **Post-processing lookup table:** A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., `-nneɣ`, `-is`, `d-`, `-agi`).
|
| 27 |
+
- **High-contrast visualization:** Color-coded tokens with confidence scores.
|
| 28 |
+
|
| 29 |
+
## Supported Clitics
|
| 30 |
+
The app recognizes and correctly tags the following Kabyle grammatical morphemes:
|
| 31 |
+
|
| 32 |
+
### Possessive Affixes
|
| 33 |
+
- Singular: `-w`/`-iw`, `-k`/`-ik`, `-m`/`-im`, `-s`/`-is`
|
| 34 |
+
- Plural: `-nneɣ`, `-wen`/`-nwen`, `-kent`/`-nkent`, `-sen`/`-nsen`, `-sent`/`-nsent`
|
| 35 |
+
|
| 36 |
+
### Direct Object Pronouns (Accusative)
|
| 37 |
+
- `-iyi`/`-yi`, `-k`/`-ik`, `-kem`, `-t`/`-tt`, `-itt`, `-aɣ`/`-yaɣ`, `-ken`, `-kent`, `-ten`, `-tent`
|
| 38 |
+
|
| 39 |
+
### Indirect Object Pronouns (Dative)
|
| 40 |
+
- `-iyi`/`-yi`, `-ak`, `-am`, `-as`/`-asen`, `-aneɣ`/`-anaɣ`, `-awen`, `-akent`, `-asen`/`-atsen`, `-asent`/`-atsent`
|
| 41 |
+
|
| 42 |
+
### Directional & Copula Particles
|
| 43 |
+
- `d-`/`-d`/`-id` — Proximal particle (toward speaker / "it is")
|
| 44 |
+
- `n-`/`-in` — Distal particle (away from speaker)
|
| 45 |
+
|
| 46 |
+
### Demonstratives & Determiners
|
| 47 |
+
- `-agi`/`-a` — This / These
|
| 48 |
+
- `-nni` — That / Those (previously mentioned)
|
| 49 |
+
- `-nniḍen`/`-niḍen` — Other / Another
|
| 50 |
+
|
| 51 |
## Usage
|
| 52 |
Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.
|
| 53 |
|
| 54 |
+
### Example Sentences
|
| 55 |
+
- `Aṭas n medden i yessen.`
|
| 56 |
+
- `Taqbaylit d tutlayt deg Lezzayer.`
|
| 57 |
+
- `Yella wuccen ameqqran deg taddart.`
|
| 58 |
+
- `Tameddakelt-nneɣ teɣra adlis-is.`
|
| 59 |
+
- `D nekkni i d-yusan d imezwura.`
|
| 60 |
+
|
| 61 |
## Limitations
|
| 62 |
+
- Capitalized sentence-initial words may be biased toward `NOUN`/`PROPN` due to training data distribution.
|
| 63 |
+
- Domain bias toward short translated sentences (Tatoeba corpus).
|
| 64 |
+
- No diacritic normalization.
|
|
|
|
| 65 |
|
| 66 |
## Citation
|
| 67 |
+
Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.
|
| 68 |
+
|
| 69 |
+
## Acknowledgments
|
| 70 |
+
- Model trained by **boffire** (ButterflyOfFire)
|