---
language:
- kab
- ber
tags:
- emotion-classification
- african-languages
- amazigh
- low-resource
- goemotions
- afro-asiatic
license: apache-2.0
library_name: transformers
base_model: Davlan/afro-xlmr-large
model-index:
- name: kabyle-emotion-afro-xlmr
  results:
  - task:
      type: text-classification
      name: Emotion Classification
    dataset:
      type: silver-labeled
      name: English-Kabyle Parallel Corpus (Tatoeba + Round-trip)
    metrics:
    - type: f1
      value: 0.817
      name: Validation Weighted F1
    - type: accuracy
      value: 0.815
      name: Validation Accuracy
    - type: f1
      value: 0.641
      name: Test Weighted F1
    - type: accuracy
      value: 0.648
      name: Test Accuracy
---

# Kabyle Emotion Classifier (AfroXLMR-Large + GoEmotions)

A fine-tuned **AfroXLMR-Large** model for **28-class emotion recognition in Kabyle** (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria.

This is the third iteration of the Kabyle emotion model, upgrading from XLM-RoBERTa-base to AfroXLMR-Large and from 6-class Ekman labels to 28-class GoEmotions fine-grained labels.

---

## Model Details

| Attribute | Value |
|-----------|-------|
| **Base model** | `Davlan/afro-xlmr-large` (AfroXLMR-Large, ~560M params) |
| **Architecture** | XLM-RoBERTa for Sequence Classification |
| **Parameters** | ~560M |
| **Language** | Kabyle (`kab`) |
| **Task** | Text Classification (Emotion Detection) |
| **Classes** | 28 — GoEmotions taxonomy |
| **Best checkpoint** | Epoch 5 (loaded via `load_best_model_at_end`) |

### 28 Emotion Classes

`admiration`, `amusement`, `anger`, `annoyance`, `approval`, `caring`, `confusion`, `curiosity`, `desire`, `disappointment`, `disapproval`, `disgust`, `embarrassment`, `excitement`, `fear`, `gratitude`, `grief`, `joy`, `love`, `nervousness`, `neutral`, `optimism`, `pride`, `realization`, `relief`, `remorse`, `sadness`, `surprise`

---

## Training Data

The model was trained via **cross-lingual label transfer** from English to Kabyle using parallel sentence pairs:

1. **Round-trip parallel corpus** (`eng_kab_roundtrip_good.tsv`) — 131,301 English-Kabyle sentence pairs with back-translation quality scores.
2. **Tatoeba parallel corpus** — 138,353 additional English-Kabyle linked sentences from tatoeba.org.

**Labeling pipeline:**

- English sentences were labeled with `cirimus/modernbert-base-go-emotions` (28-class GoEmotions classifier).
- The single best GoEmotions label and its raw sigmoid confidence were transferred to the Kabyle side via sentence alignment.
- Per-class adaptive thresholds and caps were applied to balance the dataset across all 28 labels.

**Final balanced dataset:**

- **Total labeled rows (raw):** ~204,000
- **Final training set:** 46,516 rows
- **Validation set:** 6,203 rows
- **Test set:** 9,304 rows

---

## Performance

### Validation Set (Epoch 5)

| Metric | Score |
|--------|-------|
| **F1 (weighted)** | **0.817** |
| **Accuracy** | **0.815** |

### Test Set Results (9,304 samples)

| Emotion | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| admiration | 0.663 | 0.523 | 0.585 | 900 |
| amusement | 0.746 | 0.730 | 0.738 | 137 |
| anger | 0.577 | 0.518 | 0.546 | 326 |
| annoyance | 0.326 | 0.127 | 0.183 | 118 |
| approval | 0.519 | 0.388 | 0.444 | 417 |
| caring | 0.622 | 0.313 | 0.416 | 521 |
| confusion | 0.701 | 0.653 | 0.676 | 288 |
| **curiosity** | **0.938** | **0.977** | **0.957** | 1200 |
| **desire** | **0.880** | **0.885** | **0.882** | 479 |
| disappointment | 0.319 | 0.285 | 0.301 | 130 |
| disapproval | 0.691 | 0.724 | 0.707 | 648 |
| disgust | 0.108 | 0.061 | 0.078 | 66 |
| embarrassment | 0.231 | 0.500 | 0.316 | 42 |
| excitement | 0.201 | 0.243 | 0.220 | 111 |
| fear | 0.738 | 0.684 | 0.710 | 247 |
| **gratitude** | **0.957** | **0.892** | **0.923** | 148 |
| grief | 0.273 | 0.882 | 0.417 | 17 |
| joy | 0.677 | 0.417 | 0.516 | 357 |
| **love** | **0.832** | **0.780** | **0.805** | 513 |
| nervousness | 0.280 | 0.535 | 0.368 | 99 |
| neutral | 0.579 | 0.833 | 0.683 | 1200 |
| optimism | 0.502 | 0.779 | 0.611 | 280 |
| pride | 0.476 | 0.833 | 0.606 | 36 |
| realization | 0.150 | 0.570 | 0.237 | 100 |
| relief | 0.111 | 0.071 | 0.087 | 14 |
| **remorse** | **0.718** | **0.761** | **0.739** | 134 |
| sadness | 0.537 | 0.225 | 0.317 | 547 |
| surprise | 0.802 | 0.672 | 0.732 | 229 |

- **Accuracy:** 0.648
- **Weighted Avg F1:** **0.641**
- **Macro Avg F1:** 0.529

---

## How to Use

### Quick inference with `transformers`

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="boffire/kabyle-emotion-afro-xlmr",
    device=0  # use -1 for CPU
)

# Example sentences
examples = [
    "Ur d-yelli ara wid akken ttwali",
    "Lliɣ d aɣeznay i uqeddic-agi",
    "Ihi, ma yella, ad nerr",
    "Ahat ad yemmut umdan-nni",
    "Tameddakelt-iw tezwared-iyi",
]

for text in examples:
    result = classifier(text, top_k=None)
    top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0]
    print(f"{text} -> {top['label']} ({top['score']:.3f})")
```

### Loading the model directly

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-afro-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-afro-xlmr")

# Tokenize and predict
inputs = tokenizer("Tura, Jeǧǧiga tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True)
outputs = model(**inputs)
```

---

## Training Details

| Hyperparameter | Value |
|----------------|-------|
| Epochs | 5 (early stopping patience=2) |
| Batch size | 16 per device (effective 64 with gradient accumulation) |
| Gradient accumulation | 4 |
| Learning rate | 2e-5 |
| Max sequence length | 96 |
| Weight decay | 0.01 |
| Warmup steps | ~10% of total steps |
| Optimizer | AdamW |
| Class weights | Balanced (`sklearn.utils.class_weight.compute_class_weight`) |
| Mixed precision | None (float32) |
| Best checkpoint | Epoch 5 |

---

## Limitations & Caveats

1. **Silver labels:** Ground-truth emotions were projected from an English GoEmotions classifier. Some labels may not perfectly capture Kabyle cultural or emotional nuance.
2. **Rare class weakness:** Classes with very few test examples (`relief`: 14, `grief`: 17, `disgust`: 66) have low F1 scores. The model struggles to learn reliable patterns for these.
3. **Neutral class:** While `neutral` now comes from a real GoEmotions label (not synthetic uncertainty), it still dominates the raw distribution and is capped to 2,000 training examples.
4. **Translation quality:** The parallel corpus includes round-trip translated sentences. Imperfect translations may introduce label noise.
5. **No native speaker validation:** The test set was held out from the same silver-labeled pool. A small native-annotated benchmark would give a more accurate human ceiling.
6. **Domain limitation:** Training data comes from Tatoeba (simple, short sentences) and round-trip translations. Performance may degrade on longer, more complex Kabyle text (social media, literature, etc.).
7. **Kabyle not in AfroXLMR pre-training corpus:** AfroXLMR-Large was trained on 17 African languages, but Kabyle was not among them. The model relies on transfer from related Afro-Asiatic languages (e.g., Amharic, Arabic).

---

## Intended Use

- **Research** in low-resource NLP and Afro-Asiatic / Amazigh language processing.
- **Downstream applications** requiring fine-grained emotion signals in Kabyle text (e.g., content moderation, mental-health screening, customer feedback analysis).
- **Baseline** for future Kabyle emotion models trained on native annotations.

---

## Citation

If you use this model, please cite:

```bibtex
@misc{boffire_kabyle_emotion_afro_xlmr,
  title = {Kabyle Emotion Classifier (AfroXLMR-Large + GoEmotions)},
  author = {Boffire},
  year = {2026},
  howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-afro-xlmr}},
  note = {Fine-tuned AfroXLMR-Large for 28-class GoEmotions detection in Kabyle via cross-lingual label transfer from English}
}
```

---

## Acknowledgments

- **Davlan** for the `afro-xlmr-large` base model and African-centric pre-training.
- **cirimus** for the `modernbert-base-go-emotions` English emotion classifier.
- **Google Research** for the GoEmotions dataset.
- **Tatoeba Project** for the English-Kabyle parallel corpus.
- **Hugging Face** `transformers`, `datasets`, and `accelerate` teams for the training infrastructure.

---

## License

This model is released under the **Apache 2.0** license.

The base model (`Davlan/afro-xlmr-large`) and English emotion classifier (`cirimus/modernbert-base-go-emotions`) are subject to their respective **MIT** licenses. The GoEmotions dataset is **Apache 2.0**.