Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
metadata
title: Kabyle POS Tagger
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
Kabyle POS Tagger Demo
Interactive demo for the boffire/kabyle-pos-v2 model — a Part-of-Speech tagger for Kabyle (kab), a Berber language spoken in Algeria.
Model Details
- Base: XLM-RoBERTa-base
- Task: Token Classification (POS tagging)
- Test F1: 93.8%
- Training Data: 10,000 sentences with a 214,000-entry lexicon
- Tagset: Universal Dependencies (17 tags)
Features
- Punctuation-aware tokenization: Attached punctuation (e.g.,
medden.) is automatically split and tagged asPUNCT. - Clitic handling: Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
- Post-processing lookup table: A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g.,
-nneɣ,-is,d-,-agi). - High-contrast visualization: Color-coded tokens with confidence scores.
Supported Clitics
The app recognizes and correctly tags the following Kabyle grammatical morphemes:
Possessive Affixes
- Singular:
-w/-iw,-k/-ik,-m/-im,-s/-is - Plural:
-nneɣ,-wen/-nwen,-kent/-nkent,-sen/-nsen,-sent/-nsent
Direct Object Pronouns (Accusative)
-iyi/-yi,-k/-ik,-kem,-t/-tt,-itt,-aɣ/-yaɣ,-ken,-kent,-ten,-tent
Indirect Object Pronouns (Dative)
-iyi/-yi,-ak,-am,-as/-asen,-aneɣ/-anaɣ,-awen,-akent,-asen/-atsen,-asent/-atsent
Directional & Copula Particles
d-/-d/-id— Proximal particle (toward speaker / "it is")n-/-in— Distal particle (away from speaker)
Demonstratives & Determiners
-agi/-a— This / These-nni— That / Those (previously mentioned)-nniḍen/-niḍen— Other / Another
Usage
Type or paste a Kabyle sentence and click Submit to see predicted POS tags with confidence scores.
Example Sentences
Aṭas n medden i yessen.Taqbaylit d tutlayt deg Lezzayer.Yella wuccen ameqqran deg taddart.Tameddakelt-nneɣ teɣra adlis-is.D nekkni i d-yusan d imezwura.
Limitations
- Capitalized sentence-initial words may be biased toward
NOUN/PROPNdue to training data distribution. - Domain bias toward short translated sentences (Tatoeba corpus).
- No diacritic normalization.
Citation
Side part of the Masakhane initiative for African NLP. See the model card for full citation details.
Acknowledgments
- Model trained by boffire (ButterflyOfFire)