--- language: - hy tags: - asr - audio - speech - whisper - low-resource - generated_from_trainer datasets: - Chillarmo/common_voice_20_armenian - mozilla-foundation/common_voice_20_0 metrics: - wer model-index: - name: Morpheme-Aware Whisper (Armenian) results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Common Voice 20.0 type: mozilla-foundation/common_voice_20_0 config: hy split: test metrics: - name: Wer type: wer value: 40.1556 - name: Exact Match type: exact_match value: 12.1414 - name: Evaluation Loss type: loss value: 1.3123 --- # Morpheme-Aware Whisper for Low-Resource Armenian ASR This model is a fine-tuned version of **Whisper Tiny**, utilizing a frozen encoder and a custom **morpheme tokenizer** to achieve cost-effective and accurate speech-to-text for the Armenian language. It outperforms standard OpenAI Whisper models in speed and specific Armenian accuracy by training on the **Common Voice 20.0** dataset. ## Model Details - **Model Architecture:** Whisper Tiny (Frozen Encoder, Retrained Decoder) - **Language:** Armenian (`hy`) - **Tokenizer:** Custom Morpheme Tokenizer - **Dataset:** [Chillarmo/common_voice_20_armenian](https://huggingface.co/datasets/Chillarmo/common_voice_20_armenian) - **Paper:** *Morpheme-Aware Whisper for Low-Resource Armenian ASR* (Movsesyan, 2025) ## Abstract & Motivation For a language such as Armenian, having the ability to accurately and cost-effectively translate speech into text is huge. However, due to its low-resource nature, it does not have this yet. This project utilizes Whisper Tiny as the core technology to achieve this goal. The problem with current generic models is that they are not robust enough for real-world usage. A business professional or an Armenian organization cannot simply deploy current methods and expect them to work; the high error rates produce nonsense outputs. This leads to accessibility issues at events where the speaker is Armenian, but the audience may not fully comprehend the language, forcing them to focus on decoding speech rather than understanding concepts. ## Evaluation Results | Metric | Value | | :--- | :--- | | **WER (Word Error Rate)** | **40.16%** | | **Exact Match** | **12.14%** | | **Eval Loss** | **1.31** | *Note: The WER was calculated using strict Armenian text normalization (excluding punctuation and non-Armenian characters).* ## Training Procedure ### Hyperparameters The following hyperparameters were used during training: - **learning_rate:** 5e-05 - **train_batch_size:** 8 - **eval_batch_size:** 16 - **seed:** 42 - **optimizer:** adamw_torch_fused (betas=(0.9,0.999), epsilon=1e-08) - **lr_scheduler_type:** linear - **num_epochs:** 3.0 - **mixed_precision_training:** Native AMP ### Framework Versions - Transformers 4.56.2 - Pytorch 2.8.0+cu129 - Datasets 3.5.0 - Tokenizers 0.22.1 ## Citation If you use this model, please cite the following work: ```bibtex @inproceedings{movsesyan2025morpheme, author = {Movses Movsesyan}, title = {Morpheme-Aware Whisper for Low-Resource Armenian ASR}, booktitle = {ACM}, year = {2025}, url = {[https://doi.org/10.1145/nnnnnnn.nnnnnnn](https://doi.org/10.1145/nnnnnnn.nnnnnnn)} }