Kabyle-POS-tagger / README.md
boffire's picture
Update README.md
980a752 verified
|
Raw
History Blame Contribute Delete
2.71 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Kabyle POS Tagger
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0

Kabyle POS Tagger Demo

Interactive demo for the boffire/kabyle-pos-v2 model — a Part-of-Speech tagger for Kabyle (kab), a Berber language spoken in Algeria.

Model Details

  • Base: XLM-RoBERTa-base
  • Task: Token Classification (POS tagging)
  • Test F1: 93.8%
  • Training Data: 10,000 sentences with a 214,000-entry lexicon
  • Tagset: Universal Dependencies (17 tags)

Features

  • Punctuation-aware tokenization: Attached punctuation (e.g., medden.) is automatically split and tagged as PUNCT.
  • Clitic handling: Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
  • Post-processing lookup table: A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., -nneɣ, -is, d-, -agi).
  • High-contrast visualization: Color-coded tokens with confidence scores.

Supported Clitics

The app recognizes and correctly tags the following Kabyle grammatical morphemes:

Possessive Affixes

  • Singular: -w/-iw, -k/-ik, -m/-im, -s/-is
  • Plural: -nneɣ, -wen/-nwen, -kent/-nkent, -sen/-nsen, -sent/-nsent

Direct Object Pronouns (Accusative)

  • -iyi/-yi, -k/-ik, -kem, -t/-tt, -itt, -aɣ/-yaɣ, -ken, -kent, -ten, -tent

Indirect Object Pronouns (Dative)

  • -iyi/-yi, -ak, -am, -as/-asen, -aneɣ/-anaɣ, -awen, -akent, -asen/-atsen, -asent/-atsent

Directional & Copula Particles

  • d-/-d/-id — Proximal particle (toward speaker / "it is")
  • n-/-in — Distal particle (away from speaker)

Demonstratives & Determiners

  • -agi/-a — This / These
  • -nni — That / Those (previously mentioned)
  • -nniḍen/-niḍen — Other / Another

Usage

Type or paste a Kabyle sentence and click Submit to see predicted POS tags with confidence scores.

Example Sentences

  • Aṭas n medden i yessen.
  • Taqbaylit d tutlayt deg Lezzayer.
  • Yella wuccen ameqqran deg taddart.
  • Tameddakelt-nneɣ teɣra adlis-is.
  • D nekkni i d-yusan d imezwura.

Limitations

  • Capitalized sentence-initial words may be biased toward NOUN/PROPN due to training data distribution.
  • Domain bias toward short translated sentences (Tatoeba corpus).
  • No diacritic normalization.

Citation

Side part of the Masakhane initiative for African NLP. See the model card for full citation details.

Acknowledgments

  • Model trained by boffire (ButterflyOfFire)