Spaces:
Sleeping
Sleeping
File size: 2,713 Bytes
1eccc67 3b74ed2 1eccc67 0151195 1eccc67 3b74ed2 980a752 3b74ed2 980a752 3b74ed2 980a752 3b74ed2 980a752 3b74ed2 980a752 3b74ed2 980a752 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | ---
title: Kabyle POS Tagger
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
---
# Kabyle POS Tagger Demo
Interactive demo for the [boffire/kabyle-pos-v2](https://huggingface.co/boffire/kabyle-pos-v2) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.
## Model Details
- **Base:** XLM-RoBERTa-base
- **Task:** Token Classification (POS tagging)
- **Test F1:** 93.8%
- **Training Data:** 10,000 sentences with a 214,000-entry lexicon
- **Tagset:** Universal Dependencies (17 tags)
## Features
- **Punctuation-aware tokenization:** Attached punctuation (e.g., `medden.`) is automatically split and tagged as `PUNCT`.
- **Clitic handling:** Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
- **Post-processing lookup table:** A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., `-nneɣ`, `-is`, `d-`, `-agi`).
- **High-contrast visualization:** Color-coded tokens with confidence scores.
## Supported Clitics
The app recognizes and correctly tags the following Kabyle grammatical morphemes:
### Possessive Affixes
- Singular: `-w`/`-iw`, `-k`/`-ik`, `-m`/`-im`, `-s`/`-is`
- Plural: `-nneɣ`, `-wen`/`-nwen`, `-kent`/`-nkent`, `-sen`/`-nsen`, `-sent`/`-nsent`
### Direct Object Pronouns (Accusative)
- `-iyi`/`-yi`, `-k`/`-ik`, `-kem`, `-t`/`-tt`, `-itt`, `-aɣ`/`-yaɣ`, `-ken`, `-kent`, `-ten`, `-tent`
### Indirect Object Pronouns (Dative)
- `-iyi`/`-yi`, `-ak`, `-am`, `-as`/`-asen`, `-aneɣ`/`-anaɣ`, `-awen`, `-akent`, `-asen`/`-atsen`, `-asent`/`-atsent`
### Directional & Copula Particles
- `d-`/`-d`/`-id` — Proximal particle (toward speaker / "it is")
- `n-`/`-in` — Distal particle (away from speaker)
### Demonstratives & Determiners
- `-agi`/`-a` — This / These
- `-nni` — That / Those (previously mentioned)
- `-nniḍen`/`-niḍen` — Other / Another
## Usage
Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.
### Example Sentences
- `Aṭas n medden i yessen.`
- `Taqbaylit d tutlayt deg Lezzayer.`
- `Yella wuccen ameqqran deg taddart.`
- `Tameddakelt-nneɣ teɣra adlis-is.`
- `D nekkni i d-yusan d imezwura.`
## Limitations
- Capitalized sentence-initial words may be biased toward `NOUN`/`PROPN` due to training data distribution.
- Domain bias toward short translated sentences (Tatoeba corpus).
- No diacritic normalization.
## Citation
Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.
## Acknowledgments
- Model trained by **boffire** (ButterflyOfFire) |