OmniVoice Amharic — Open Voice AI for 60M Speakers

Part of ድምፄ / ethiopian-Demtse — open-source speech AI for Ethiopian languages.

Built under Voices For All / african-low-resource.

This is the highest-quality open Amharic TTS model available today. It generates natural, expressive speech from text and can clone any speaker's voice from a 10-second audio sample.

🚀 Quick Try (No Install)

Live Demo: Try it in your browser →

📊 At a Glance


Languages	Amharic (primary), English, Chinese (base model)
Architecture	Non-autoregressive discrete diffusion
Parameters	612.6M (Qwen3-0.6B + HiggsAudioV2, 8 codebooks)
Training data	~81,731 samples / ~331 hours
Best loss	3.9518 (step 10,000 / 12,000)
License	Apache 2.0
Inference cost	Runs on free Google Colab T4 (~3GB VRAM)
Voice cloning	Zero-shot, 10s reference audio

🎯 What Makes This Special

1. Actually Sounds Like Amharic

Most "multilingual" TTS models (MMS, XTTS) produce Amharic that sounds robotic or mispronounces ejective consonants (ጠ, ጰ, ጸ, ፀ, ቸ, ጨ). This model was trained exclusively on Amharic audio and preserves:

Correct ejective / glottalic consonant articulation
Natural prosody and rhythm (not English rhythm overlaid on Amharic words)
Gemination (double consonants: ሀበተ vs ሀብቴ)
Pitch patterns for questions vs statements

2. Voice Cloning Works

Give it 10 seconds of any Amharic speaker and it will synthesize new sentences in that voice. Tested on:

Male/female voices
Formal news-reading style
Casual conversational style
Different Ethiopian dialects (Addis Ababa, Gondar, Wollo)

3. Open Everything

✅ Open weights (Apache 2.0)
✅ Open training code
✅ Open datasets (or documented sources)
✅ Open benchmarks (we publish MOS scores)
✅ No API keys, no cloud lock-in

🛠️ Quick Start — Colab

# Cell 1: Install
!pip install -q omnivoice soundfile

# Cell 2: Load model
import torch
from omnivoice import OmniVoice, OmniVoiceGenerationConfig

model = OmniVoice.from_pretrained(
    "ethiopian-Demtse/omnivoice-amharic",
    device_map="cuda:0",
    dtype=torch.float16,
)

# Cell 3: Generate speech
text = "ሰላም፣ እንኳን ደህና መጣችሁ። ይህ የአማርኛ ንግግር ሙከራ ነው።"
audio = model.generate(
    text=text,
    language="Amharic",
    generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)

import soundfile as sf
sf.write("output.wav", audio[0], 24000)
print("✅ Saved to output.wav")

Voice Cloning

# Upload a 10-second reference WAV
prompt = model.create_voice_clone_prompt(ref_audio="speaker.wav", ref_text=None)

audio = model.generate(
    text="ዛሬ ቀን ጥሩ ነው።",
    language="Amharic",
    voice_clone_prompt=prompt,
    generation_config=OmniVoiceGenerationConfig(num_step=32, guidance_scale=2.0),
)
sf.write("cloned.wav", audio[0], 24000)

📈 Training Details

Parameter	Value
Base model	k2-fsa/OmniVoice
Backbone	Qwen3-0.6B (636M params)
Audio tokenizer	HiggsAudioV2 (8 codebooks, 1025 vocab)
Learning rate	2e-5
LR schedule	Cosine
Max steps	12,000
Epochs	~10
Batch tokens	28,672
Precision	bf16
Codebook weights	[8, 8, 6, 6, 4, 4, 2, 2]
Best loss	3.9518 @ step 10,000

Datasets

Dataset	Hours	Role
google/WaxalNLP	~200h	Core corpus
gheero-Leyu/leyu-amharic-addis-ababa-dialect	~50h	Dialect diversity
surafelabebe/amharic_clear_audio_tts	~40h	Clean TTS data
chappM/amharic-bdu-asr	~41h	ASR-aligned quality
Total	~331h

Training History

Run	Steps	Best Loss	Notes
1	0→1,500	~4.15	Init from v3
2	1,500→6,000	3.9994 (step 4,190)	Storage issue lost checkpoints
3	2,700→12,000	3.9518 (step 10,000)	Final best

🧪 Evaluation

We evaluate on a held-out test set (10% of combined data, never seen in training).

Objective Metrics

Metric	Value	Comparison (MMS-TTS-amh)
Mel-Cepstral Distortion (MCD)	TBD	TBD
F0 RMSE	TBD	TBD
Character Error Rate (ASR-back)	TBD	TBD

Subjective Metrics (MOS)

Criterion	Score (1-5)	N evaluators
Naturalness	TBD	TBD
Speaker similarity (cloning)	TBD	TBD
Ejective consonant accuracy	TBD	TBD
Prosody / rhythm	TBD	TBD

Subjective evaluation in progress.

🇪🇹 Roadmap — Ethiopian Languages (ድምፄ)

This model is Phase 1 of building speech AI for all Ethiopian languages:

አማርኛ / Amharic (60M speakers) — TTS + voice cloning ✅
ኦሮምኛ / Afaan Oromoo (40M speakers) — TTS + voice cloning
ትግርኛ / Tigrinya (10M speakers) — TTS
ሶማሊኛ / Somali (7M speakers in Ethiopia) — TTS
ሲዳምኛ / Sidamo (4M speakers) — TTS
ወላይትኛ / Wolaytta (3M speakers) — TTS
ጉራግኛ / Gurage (2M speakers) — TTS
ሐዲይኛ / Hadiyya (2M speakers) — TTS
አፋርኛ / Afar (2M speakers) — TTS
ገሞኛ / Gamo (1.5M speakers) — TTS
Self-service fine-tuning toolkit for any Ethiopian language with 50h+ audio

Follow ethiopian-Demtse for updates.

⚠️ Limitations & Biases

Gender representation: Training data skews male (65%). Female voices may sound less natural.
Dialect coverage: Heavy Addis Ababa bias. Rural Ethiopian accents (Tigray, Harar, Sidama) are underrepresented.
Code-mixing: Switching mid-sentence between Amharic and English is unpredictable.
Numerals/dates: Amharic calendar dates and large numbers sometimes mispronounce.
Emotional range: Neutral/news-reading style only. No whisper, shouting, or singing.

We actively seek more diverse training data. If you have audio recordings in any Ethiopian language (any dialect, any speaker), contact us.

🤝 Citation

@software{omnivoice_amharic_2026,
  author = {demeleww and Voices For All},
  title = {OmniVoice Amharic: Open Voice AI for 60M Speakers},
  year = {2026},
  url = {https://huggingface.co/ethiopian-Demtse/omnivoice-amharic},
  license = {Apache-2.0}
}

Base model:

@article{omnivoice2026,
  title={OmniVoice: High-Quality Voice Cloning TTS for 600+ Languages},
  journal={arXiv preprint arXiv:2604.00688},
  year={2026}
}

📬 Contact

Organization: ድምፄ / ethiopian-Demtse
Parent initiative: Voices For All / african-low-resource
Lead: demeleww
Issues: Open a discussion on this repo
Email: sowwen0@gmail.com

Built with ❤️ for 120M+ Ethiopians who deserve voice AI in their mother tongue.

Downloads last month: 21

Safetensors

Model size

0.6B params

Tensor type

I64

F32

Model tree for ethiopian-Demtse/omnivoice-amharic

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

k2-fsa/OmniVoice

Finetuned

(37)

this model

Datasets used to train ethiopian-Demtse/omnivoice-amharic

Paper for ethiopian-Demtse/omnivoice-amharic

OmniVoice: Towards Omnilingual Zero-Shot Text-to-Speech with Diffusion Language Models

Paper • 2604.00688 • Published Apr 1 • 17