--- title: Kabyle POS Tagger colorFrom: blue colorTo: green sdk: gradio sdk_version: 6.15.2 app_file: app.py pinned: false license: apache-2.0 --- # Kabyle POS Tagger Demo Interactive demo for the [boffire/kabyle-pos-v2](https://huggingface.co/boffire/kabyle-pos-v2) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria. ## Model Details - **Base:** XLM-RoBERTa-base - **Task:** Token Classification (POS tagging) - **Test F1:** 93.8% - **Training Data:** 10,000 sentences with a 214,000-entry lexicon - **Tagset:** Universal Dependencies (17 tags) ## Features - **Punctuation-aware tokenization:** Attached punctuation (e.g., `medden.`) is automatically split and tagged as `PUNCT`. - **Clitic handling:** Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly. - **Post-processing lookup table:** A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., `-nneɣ`, `-is`, `d-`, `-agi`). - **High-contrast visualization:** Color-coded tokens with confidence scores. ## Supported Clitics The app recognizes and correctly tags the following Kabyle grammatical morphemes: ### Possessive Affixes - Singular: `-w`/`-iw`, `-k`/`-ik`, `-m`/`-im`, `-s`/`-is` - Plural: `-nneɣ`, `-wen`/`-nwen`, `-kent`/`-nkent`, `-sen`/`-nsen`, `-sent`/`-nsent` ### Direct Object Pronouns (Accusative) - `-iyi`/`-yi`, `-k`/`-ik`, `-kem`, `-t`/`-tt`, `-itt`, `-aɣ`/`-yaɣ`, `-ken`, `-kent`, `-ten`, `-tent` ### Indirect Object Pronouns (Dative) - `-iyi`/`-yi`, `-ak`, `-am`, `-as`/`-asen`, `-aneɣ`/`-anaɣ`, `-awen`, `-akent`, `-asen`/`-atsen`, `-asent`/`-atsent` ### Directional & Copula Particles - `d-`/`-d`/`-id` — Proximal particle (toward speaker / "it is") - `n-`/`-in` — Distal particle (away from speaker) ### Demonstratives & Determiners - `-agi`/`-a` — This / These - `-nni` — That / Those (previously mentioned) - `-nniḍen`/`-niḍen` — Other / Another ## Usage Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores. ### Example Sentences - `Aṭas n medden i yessen.` - `Taqbaylit d tutlayt deg Lezzayer.` - `Yella wuccen ameqqran deg taddart.` - `Tameddakelt-nneɣ teɣra adlis-is.` - `D nekkni i d-yusan d imezwura.` ## Limitations - Capitalized sentence-initial words may be biased toward `NOUN`/`PROPN` due to training data distribution. - Domain bias toward short translated sentences (Tatoeba corpus). - No diacritic normalization. ## Citation Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details. ## Acknowledgments - Model trained by **boffire** (ButterflyOfFire)