File size: 2,713 Bytes
1eccc67
 
3b74ed2
 
1eccc67
0151195
1eccc67
 
 
 
 
3b74ed2
 
980a752
3b74ed2
 
 
 
980a752
 
3b74ed2
 
980a752
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b74ed2
 
 
980a752
 
 
 
 
 
 
3b74ed2
980a752
 
 
3b74ed2
 
980a752
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: Kabyle POS Tagger
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.15.2
app_file: app.py
pinned: false
license: apache-2.0
---

# Kabyle POS Tagger Demo

Interactive demo for the [boffire/kabyle-pos-v2](https://huggingface.co/boffire/kabyle-pos-v2) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.

## Model Details
- **Base:** XLM-RoBERTa-base
- **Task:** Token Classification (POS tagging)
- **Test F1:** 93.8%
- **Training Data:** 10,000 sentences with a 214,000-entry lexicon
- **Tagset:** Universal Dependencies (17 tags)

## Features
- **Punctuation-aware tokenization:** Attached punctuation (e.g., `medden.`) is automatically split and tagged as `PUNCT`.
- **Clitic handling:** Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
- **Post-processing lookup table:** A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., `-nneɣ`, `-is`, `d-`, `-agi`).
- **High-contrast visualization:** Color-coded tokens with confidence scores.

## Supported Clitics
The app recognizes and correctly tags the following Kabyle grammatical morphemes:

### Possessive Affixes
- Singular: `-w`/`-iw`, `-k`/`-ik`, `-m`/`-im`, `-s`/`-is`
- Plural: `-nneɣ`, `-wen`/`-nwen`, `-kent`/`-nkent`, `-sen`/`-nsen`, `-sent`/`-nsent`

### Direct Object Pronouns (Accusative)
- `-iyi`/`-yi`, `-k`/`-ik`, `-kem`, `-t`/`-tt`, `-itt`, `-aɣ`/`-yaɣ`, `-ken`, `-kent`, `-ten`, `-tent`

### Indirect Object Pronouns (Dative)
- `-iyi`/`-yi`, `-ak`, `-am`, `-as`/`-asen`, `-aneɣ`/`-anaɣ`, `-awen`, `-akent`, `-asen`/`-atsen`, `-asent`/`-atsent`

### Directional & Copula Particles
- `d-`/`-d`/`-id` — Proximal particle (toward speaker / "it is")
- `n-`/`-in` — Distal particle (away from speaker)

### Demonstratives & Determiners
- `-agi`/`-a` — This / These
- `-nni` — That / Those (previously mentioned)
- `-nniḍen`/`-niḍen` — Other / Another

## Usage
Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.

### Example Sentences
- `Aṭas n medden i yessen.`
- `Taqbaylit d tutlayt deg Lezzayer.`
- `Yella wuccen ameqqran deg taddart.`
- `Tameddakelt-nneɣ teɣra adlis-is.`
- `D nekkni i d-yusan d imezwura.`

## Limitations
- Capitalized sentence-initial words may be biased toward `NOUN`/`PROPN` due to training data distribution.
- Domain bias toward short translated sentences (Tatoeba corpus).
- No diacritic normalization.

## Citation
Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.

## Acknowledgments
- Model trained by **boffire** (ButterflyOfFire)