boffire commited on
Commit
980a752
·
verified ·
1 Parent(s): 71601f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -7
README.md CHANGED
@@ -11,22 +11,60 @@ license: apache-2.0
11
 
12
  # Kabyle POS Tagger Demo
13
 
14
- Interactive demo for the [boffire/kabyle-pos](https://huggingface.co/boffire/kabyle-pos) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.
15
 
16
  ## Model Details
17
  - **Base:** XLM-RoBERTa-base
18
  - **Task:** Token Classification (POS tagging)
19
- - **Test F1:** 87.5%
 
20
  - **Tagset:** Universal Dependencies (17 tags)
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Usage
23
  Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.
24
 
 
 
 
 
 
 
 
25
  ## Limitations
26
- - Trained on ~1,200 sentences (small dataset)
27
- - Struggles with complex cliticized verb forms
28
- - Domain bias toward short translated sentences (Tatoeba corpus)
29
- - No diacritic normalization
30
 
31
  ## Citation
32
- Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.
 
 
 
 
11
 
12
  # Kabyle POS Tagger Demo
13
 
14
+ Interactive demo for the [boffire/kabyle-pos-v2](https://huggingface.co/boffire/kabyle-pos-v2) model — a Part-of-Speech tagger for **Kabyle** (`kab`), a Berber language spoken in Algeria.
15
 
16
  ## Model Details
17
  - **Base:** XLM-RoBERTa-base
18
  - **Task:** Token Classification (POS tagging)
19
+ - **Test F1:** 93.8%
20
+ - **Training Data:** 10,000 sentences with a 214,000-entry lexicon
21
  - **Tagset:** Universal Dependencies (17 tags)
22
 
23
+ ## Features
24
+ - **Punctuation-aware tokenization:** Attached punctuation (e.g., `medden.`) is automatically split and tagged as `PUNCT`.
25
+ - **Clitic handling:** Hyphenated possessive, accusative, dative, and directional clitics are split and tagged correctly.
26
+ - **Post-processing lookup table:** A linguistically curated override table fixes misclassifications for closed-class morphemes (e.g., `-nneɣ`, `-is`, `d-`, `-agi`).
27
+ - **High-contrast visualization:** Color-coded tokens with confidence scores.
28
+
29
+ ## Supported Clitics
30
+ The app recognizes and correctly tags the following Kabyle grammatical morphemes:
31
+
32
+ ### Possessive Affixes
33
+ - Singular: `-w`/`-iw`, `-k`/`-ik`, `-m`/`-im`, `-s`/`-is`
34
+ - Plural: `-nneɣ`, `-wen`/`-nwen`, `-kent`/`-nkent`, `-sen`/`-nsen`, `-sent`/`-nsent`
35
+
36
+ ### Direct Object Pronouns (Accusative)
37
+ - `-iyi`/`-yi`, `-k`/`-ik`, `-kem`, `-t`/`-tt`, `-itt`, `-aɣ`/`-yaɣ`, `-ken`, `-kent`, `-ten`, `-tent`
38
+
39
+ ### Indirect Object Pronouns (Dative)
40
+ - `-iyi`/`-yi`, `-ak`, `-am`, `-as`/`-asen`, `-aneɣ`/`-anaɣ`, `-awen`, `-akent`, `-asen`/`-atsen`, `-asent`/`-atsent`
41
+
42
+ ### Directional & Copula Particles
43
+ - `d-`/`-d`/`-id` — Proximal particle (toward speaker / "it is")
44
+ - `n-`/`-in` — Distal particle (away from speaker)
45
+
46
+ ### Demonstratives & Determiners
47
+ - `-agi`/`-a` — This / These
48
+ - `-nni` — That / Those (previously mentioned)
49
+ - `-nniḍen`/`-niḍen` — Other / Another
50
+
51
  ## Usage
52
  Type or paste a Kabyle sentence and click **Submit** to see predicted POS tags with confidence scores.
53
 
54
+ ### Example Sentences
55
+ - `Aṭas n medden i yessen.`
56
+ - `Taqbaylit d tutlayt deg Lezzayer.`
57
+ - `Yella wuccen ameqqran deg taddart.`
58
+ - `Tameddakelt-nneɣ teɣra adlis-is.`
59
+ - `D nekkni i d-yusan d imezwura.`
60
+
61
  ## Limitations
62
+ - Capitalized sentence-initial words may be biased toward `NOUN`/`PROPN` due to training data distribution.
63
+ - Domain bias toward short translated sentences (Tatoeba corpus).
64
+ - No diacritic normalization.
 
65
 
66
  ## Citation
67
+ Side part of the **Masakhane** initiative for African NLP. See the model card for full citation details.
68
+
69
+ ## Acknowledgments
70
+ - Model trained by **boffire** (ButterflyOfFire)