File size: 3,346 Bytes

26b5e57

---
language:
- hy
tags:
- asr
- audio
- speech
- whisper
- low-resource
- generated_from_trainer
datasets:
- Chillarmo/common_voice_20_armenian
- mozilla-foundation/common_voice_20_0
metrics:
- wer
model-index:
- name: Morpheme-Aware Whisper (Armenian)
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 20.0
      type: mozilla-foundation/common_voice_20_0
      config: hy
      split: test
    metrics:
    - name: Wer
      type: wer
      value: 40.1556
    - name: Exact Match
      type: exact_match
      value: 12.1414
    - name: Evaluation Loss
      type: loss
      value: 1.3123
---

# Morpheme-Aware Whisper for Low-Resource Armenian ASR

This model is a fine-tuned version of **Whisper Tiny**, utilizing a frozen encoder and a custom **morpheme tokenizer** to achieve cost-effective and accurate speech-to-text for the Armenian language. It outperforms standard OpenAI Whisper models in speed and specific Armenian accuracy by training on the **Common Voice 20.0** dataset.

## Model Details

- **Model Architecture:** Whisper Tiny (Frozen Encoder, Retrained Decoder)
- **Language:** Armenian (`hy`)
- **Tokenizer:** Custom Morpheme Tokenizer
- **Dataset:** [Chillarmo/common_voice_20_armenian](https://huggingface.co/datasets/Chillarmo/common_voice_20_armenian)
- **Paper:** *Morpheme-Aware Whisper for Low-Resource Armenian ASR* (Movsesyan, 2025)

## Abstract & Motivation

For a language such as Armenian, having the ability to accurately and cost-effectively translate speech into text is huge. However, due to its low-resource nature, it does not have this yet. This project utilizes Whisper Tiny as the core technology to achieve this goal.

The problem with current generic models is that they are not robust enough for real-world usage. A business professional or an Armenian organization cannot simply deploy current methods and expect them to work; the high error rates produce nonsense outputs. This leads to accessibility issues at events where the speaker is Armenian, but the audience may not fully comprehend the language, forcing them to focus on decoding speech rather than understanding concepts.

## Evaluation Results

| Metric | Value |
| :--- | :--- |
| **WER (Word Error Rate)** | **40.16%** |
| **Exact Match** | **12.14%** |
| **Eval Loss** | **1.31** |

*Note: The WER was calculated using strict Armenian text normalization (excluding punctuation and non-Armenian characters).*

## Training Procedure

### Hyperparameters

The following hyperparameters were used during training:
- **learning_rate:** 5e-05
- **train_batch_size:** 8
- **eval_batch_size:** 16
- **seed:** 42
- **optimizer:** adamw_torch_fused (betas=(0.9,0.999), epsilon=1e-08)
- **lr_scheduler_type:** linear
- **num_epochs:** 3.0
- **mixed_precision_training:** Native AMP

### Framework Versions

- Transformers 4.56.2
- Pytorch 2.8.0+cu129
- Datasets 3.5.0
- Tokenizers 0.22.1

## Citation

If you use this model, please cite the following work:

```bibtex
@inproceedings{movsesyan2025morpheme,
  author    = {Movses Movsesyan},
  title     = {Morpheme-Aware Whisper for Low-Resource Armenian ASR},
  booktitle = {ACM},
  year      = {2025},
  url       = {[https://doi.org/10.1145/nnnnnnn.nnnnnnn](https://doi.org/10.1145/nnnnnnn.nnnnnnn)}
}