Spaces:

boffire
/

Kabyle-POS-tagger

Sleeping

Update README.md

980a752 verified 20 days ago

2.71 kB

A newer version of the Gradio SDK is available: 6.19.0

title: Kabyle POS Tagger
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0

Kabyle POS Tagger Demo

Interactive demo for the boffire/kabyle-pos-v2 model — a Part-of-Speech tagger for Kabyle (kab), a Berber language spoken in Algeria.

Punctuation-aware tokenization: Attached punctuation (e.g., medden.) is automatically split and tagged as PUNCT.
Clitic handling: Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
Post-processing lookup table: A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., -nneɣ, -is, d-, -agi).
High-contrast visualization: Color-coded tokens with confidence scores.

The app recognizes and correctly tags the following Kabyle grammatical morphemes:

Singular: -w/-iw, -k/-ik, -m/-im, -s/-is
Plural: -nneɣ, -wen/-nwen, -kent/-nkent, -sen/-nsen, -sent/-nsent

-iyi/-yi, -k/-ik, -kem, -t/-tt, -itt, -aɣ/-yaɣ, -ken, -kent, -ten, -tent

-iyi/-yi, -ak, -am, -as/-asen, -aneɣ/-anaɣ, -awen, -akent, -asen/-atsen, -asent/-atsent

Type or paste a Kabyle sentence and click Submit to see predicted POS tags with confidence scores.

Capitalized sentence-initial words may be biased toward NOUN/PROPN due to training data distribution.
Domain bias toward short translated sentences (Tatoeba corpus).
No diacritic normalization.

Side part of the Masakhane initiative for African NLP. See the model card for full citation details.