OmniVoice Amharic — Open Voice AI for 60M Speakers
Part of ድምፄ / ethiopian-Demtse — open-source speech AI for Ethiopian languages.
Built under Voices For All / african-low-resource.
This is the highest-quality open Amharic TTS model available today. It generates natural, expressive speech from text and can clone any speaker's voice from a 10-second audio sample.
🚀 Quick Try (No Install)
Live Demo: Try it in your browser →
📊 At a Glance
| Languages | Amharic (primary), English, Chinese (base model) |
| Architecture | Non-autoregressive discrete diffusion |
| Parameters | 612.6M (Qwen3-0.6B + HiggsAudioV2, 8 codebooks) |
| Training data | ~81,731 samples / ~331 hours |
| Best loss | 3.9518 (step 10,000 / 12,000) |
| License | Apache 2.0 |
| Inference cost | Runs on free Google Colab T4 (~3GB VRAM) |
| Voice cloning | Zero-shot, 10s reference audio |
🎯 What Makes This Special
1. Actually Sounds Like Amharic
Most "multilingual" TTS models (MMS, XTTS) produce Amharic that sounds robotic or mispronounces ejective consonants (ጠ, ጰ, ጸ, ፀ, ቸ, ጨ). This model was trained exclusively on Amharic audio and preserves:
- Correct ejective / glottalic consonant articulation
- Natural prosody and rhythm (not English rhythm overlaid on Amharic words)
- Gemination (double consonants: ሀበተ vs ሀብቴ)
- Pitch patterns for questions vs statements
2. Voice Cloning Works
Give it 10 seconds of any Amharic speaker and it will synthesize new sentences in that voice. Tested on:
- Male/female voices
- Formal news-reading style
- Casual conversational style
- Different Ethiopian dialects (Addis Ababa, Gondar, Wollo)
3. Open Everything
- ✅ Open weights (Apache 2.0)
- ✅ Open training code
- ✅ Open datasets (or documented sources)
- ✅ Open benchmarks (we publish MOS scores)
- ✅ No API keys, no cloud lock-in
🛠️ Quick Start — Colab
# Cell 1: Install
!pip install -q omnivoice soundfile
# Cell 2: Load model
import torch
from omnivoice import OmniVoice, OmniVoiceGenerationConfig
model = OmniVoice.from_pretrained(
"ethiopian-Demtse/omnivoice-amharic",
device_map="cuda:0",
dtype=torch.float16,
)
# Cell 3: Generate speech
text = "ሰላም፣ እንኳን ደህና መጣችሁ። ይህ የአማርኛ ንግግር ሙከራ ነው።"
audio = model.generate(
text=text,
language="Amharic",
generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)
import soundfile as sf
sf.write("output.wav", audio[0], 24000)
print("✅ Saved to output.wav")
Voice Cloning
# Upload a 10-second reference WAV
prompt = model.create_voice_clone_prompt(ref_audio="speaker.wav", ref_text=None)
audio = model.generate(
text="ዛሬ ቀን ጥሩ ነው።",
language="Amharic",
voice_clone_prompt=prompt,
generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)
sf.write("cloned.wav", audio[0], 24000)
📈 Training Details
| Parameter | Value |
|---|---|
| Base model | k2-fsa/OmniVoice |
| Backbone | Qwen3-0.6B (636M params) |
| Audio tokenizer | HiggsAudioV2 (8 codebooks, 1025 vocab) |
| Learning rate | 2e-5 |
| LR schedule | Cosine |
| Max steps | 12,000 |
| Epochs | ~10 |
| Batch tokens | 28,672 |
| Precision | bf16 |
| Codebook weights | [8, 8, 6, 6, 4, 4, 2, 2] |
| Best loss | 3.9518 @ step 10,000 |
Datasets
| Dataset | Hours | Role |
|---|---|---|
| google/WaxalNLP | ~200h | Core corpus |
| gheero-Leyu/leyu-amharic-addis-ababa-dialect | ~50h | Dialect diversity |
| surafelabebe/amharic_clear_audio_tts | ~40h | Clean TTS data |
| chappM/amharic-bdu-asr | ~41h | ASR-aligned quality |
| Total | ~331h |
Training History
| Run | Steps | Best Loss | Notes |
|---|---|---|---|
| 1 | 0→1,500 | ~4.15 | Init from v3 |
| 2 | 1,500→6,000 | 3.9994 (step 4,190) | Storage issue lost checkpoints |
| 3 | 2,700→12,000 | 3.9518 (step 10,000) | Final best |
🧪 Evaluation
We evaluate on a held-out test set (10% of combined data, never seen in training).
Objective Metrics
| Metric | Value | Comparison (MMS-TTS-amh) |
|---|---|---|
| Mel-Cepstral Distortion (MCD) | TBD | TBD |
| F0 RMSE | TBD | TBD |
| Character Error Rate (ASR-back) | TBD | TBD |
Subjective Metrics (MOS)
| Criterion | Score (1-5) | N evaluators |
|---|---|---|
| Naturalness | TBD | TBD |
| Speaker similarity (cloning) | TBD | TBD |
| Ejective consonant accuracy | TBD | TBD |
| Prosody / rhythm | TBD | TBD |
Subjective evaluation in progress.
🇪🇹 Roadmap — Ethiopian Languages (ድምፄ)
This model is Phase 1 of building speech AI for all Ethiopian languages:
- አማርኛ / Amharic (60M speakers) — TTS + voice cloning ✅
- ኦሮምኛ / Afaan Oromoo (40M speakers) — TTS + voice cloning
- ትግርኛ / Tigrinya (10M speakers) — TTS
- ሶማሊኛ / Somali (7M speakers in Ethiopia) — TTS
- ሲዳምኛ / Sidamo (4M speakers) — TTS
- ወላይትኛ / Wolaytta (3M speakers) — TTS
- ጉራግኛ / Gurage (2M speakers) — TTS
- ሐዲይኛ / Hadiyya (2M speakers) — TTS
- አፋርኛ / Afar (2M speakers) — TTS
- ገሞኛ / Gamo (1.5M speakers) — TTS
- Self-service fine-tuning toolkit for any Ethiopian language with 50h+ audio
Follow ethiopian-Demtse for updates.
⚠️ Limitations & Biases
- Gender representation: Training data skews male (65%). Female voices may sound less natural.
- Dialect coverage: Heavy Addis Ababa bias. Rural Ethiopian accents (Tigray, Harar, Sidama) are underrepresented.
- Code-mixing: Switching mid-sentence between Amharic and English is unpredictable.
- Numerals/dates: Amharic calendar dates and large numbers sometimes mispronounce.
- Emotional range: Neutral/news-reading style only. No whisper, shouting, or singing.
We actively seek more diverse training data. If you have audio recordings in any Ethiopian language (any dialect, any speaker), contact us.
🤝 Citation
@software{omnivoice_amharic_2026,
author = {demeleww and Voices For All},
title = {OmniVoice Amharic: Open Voice AI for 60M Speakers},
year = {2026},
url = {https://huggingface.co/ethiopian-Demtse/omnivoice-amharic},
license = {Apache-2.0}
}
Base model:
@article{omnivoice2026,
title={OmniVoice: High-Quality Voice Cloning TTS for 600+ Languages},
journal={arXiv preprint arXiv:2604.00688},
year={2026}
}
📬 Contact
- Organization: ድምፄ / ethiopian-Demtse
- Parent initiative: Voices For All / african-low-resource
- Lead: demeleww
- Issues: Open a discussion on this repo
- Email: sowwen0@gmail.com
Built with ❤️ for 120M+ Ethiopians who deserve voice AI in their mother tongue.
- Downloads last month
- 21