---
language: si
license: apache-2.0
library_name: coqui-tts
pipeline_tag: text-to-speech
inference: false
tags:
- text-to-speech
- sinhala
- tts
- vits
- coqui-tts
- speech-synthesis
datasets:
- sinhala-tts
metrics:
- mos
---

# 🗣️ Sinhala TTS VITS 🇱🇰

**Sinhala Text-to-Speech** — A [Coqui TTS](https://github.com/coqui-ai/TTS) VITS model that generates natural Sinhala speech from text, with **16 distinct voices** to choose from.

## 🎯 Model Details

| Attribute | Value |
|-----------|-------|
| **Architecture** | VITS (Variational Inference Text-to-Speech) |
| **Language** | 🇱🇰 Sinhala (සිංහල) |
| **Speakers** | 16 voices |
| **Sample Rate** | 16 kHz |
| **Parameters** | ~30M |
| **Vocab** | 97 characters (74 Sinhala Unicode + 19 punctuation + 4 special tokens) |
| **Framework** | [Coqui TTS](https://github.com/coqui-ai/TTS) 0.27.x |
| **License** | Apache 2.0 |
| **Model Format** | SafeTensors (.safetensors) |

## 🗣️ Available Speakers

| ID | Speaker Name | Description |
|----|-------------|-------------|
| 0 | **mettananda** | Male voice 1 |
| 1 | **oshadi** | Female voice 1 |
| 2 | **pn_sin_01** | Voice 3 |
| 3 | **sin_01** | Voice 4 |
| 4 | **sin_2241** | Voice 5 |
| 5 | **sin_2282** | Voice 6 |
| 6 | **sin_3531** | Voice 7 |
| 7 | **sin_3688** | Voice 8 |
| 8 | **sin_3976** | Voice 9 |
| 9 | **sin_4191** | Voice 10 |
| 10 | **sin_4499** | Voice 11 |
| 11 | **sin_5681** | Voice 12 |
| 12 | **sin_6314** | Voice 13 |
| 13 | **sin_6897** | Voice 14 |
| 14 | **sin_7183** | Voice 15 |
| 15 | **sin_9228** | Voice 16 |

## 🚀 Usage

### Option 1: Coqui TTS (Recommended)

```python
import torch
from TTS.tts.configs.vits_config import VitsConfig
from TTS.tts.models.vits import Vits
from TTS.tts.utils.text import TTSTokenizer
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.audio import AudioProcessor

# Load config
config = VitsConfig()
config.load_json("config.json")

# Initialize components
ap = AudioProcessor.init_from_config(config)
tokenizer, new_config = TTSTokenizer.init_from_config(config)
speaker_manager = SpeakerManager()
speaker_manager.load_ids_from_file("speakers.json")

# Create and load model
model = Vits(new_config, ap, tokenizer, speaker_manager)
from safetensors.torch import load_file
state_dict = load_file("sinhala_tts_vits_model.safetensors")
model.load_state_dict(state_dict, strict=False)
model.eval()

# Synthesize
text = "ආයුබෝවන්! ඔබට කොහොමද?"
outputs = model.synthesize(text, config=new_config, speaker="mettananda")

# Save audio
import soundfile as sf
sf.write("output.wav", outputs["wav"], 16000)
```

### Option 2: REST API (with included server.py)

```bash
# Start the server
python server.py

# Generate speech
curl -X POST http://localhost:8081/tts \
  -H "Content-Type: application/json" \
  -d '{
    "text": "ආයුබෝවන්!",
    "speaker": "mettananda",
    "emotion": "neutral"
  }' \
  --output output.wav

# Health check
curl http://localhost:8081/health

# List speakers
curl http://localhost:8081/speakers
```

### Option 3: HuggingFace Inference API

> ⚠️ This model uses Coqui TTS (not Transformers) and cannot be used via the standard HF Inference API. Use Coqui TTS directly or the included REST API server.

### Option 4: Docker Deployment

```bash
docker build -t sinhala-tts-server .
docker run -p 8081:8081 sinhala-tts-server
```

## 🛠️ Development Platforms

| Platform | GPU | Cost | Best For |
|----------|-----|------|----------|
| [![Kaggle](https://img.shields.io/badge/Kaggle-20BEFF?logo=kaggle&logoColor=white)](https://kaggle.com) | P100/T4 | Free (~30 hrs/week) | Quick experiments |
| [![Colab](https://img.shields.io/badge/Colab-F9AB00?logo=googlecolab&logoColor=white)](https://colab.research.google.com) | T4/A100 | Free / $10/mo Pro | Training runs |
| [![Modal](https://img.shields.io/badge/Modal-1D2C3E?logo=modal&logoColor=white)](https://modal.com) | A100 80GB | $20 free credit | Full training |
| [![RunPod](https://img.shields.io/badge/RunPod-6C1EE7?logo=runpod&logoColor=white)](https://runpod.io) | RTX 4090/A100 | $0.34–$2.00/hr | Production |

## 📦 Files

| File | Description | Size |
|------|-------------|------|
| `sinhala_tts_vits_model.safetensors` | Model weights (SafeTensors) | 316 MB |
| `config.json` | Model configuration | 8 KB |
| `speakers.json` | Speaker ID mapping | 300 B |
| `server.py` | FastAPI REST inference server | 6 KB |
| `Dockerfile` | Docker build for production | 2 KB |
| `DEVELOPER_GUIDE.md` | Training & development guide | 15 KB |

## 🎓 Training & Fine-Tuning

For detailed instructions, see the [DEVELOPER_GUIDE.md](./DEVELOPER_GUIDE.md) which covers:

- **Setup**: Environment configuration and dependency installation
- **Training from scratch**: Full training pipeline with the Sinhala dataset
- **Fine-tuning**: Adapting the model to new voices or domains
- **Dataset preparation**: Preprocessing Sinhala audio data
- **Export to SafeTensors**: Converting PyTorch checkpoints to SafeTensors format
- **Cloud GPU training**: Step-by-step guides for Kaggle, Colab, and Modal

## 🌐 Deployment Options

| Method | Description | Best For |
|--------|-------------|----------|
| **HuggingFace Space** | Gradio web UI (live demo) | Quick testing |
| **FastAPI Server** | REST API with Docker | Production APIs |
| **Local Python** | Direct model loading | Development |
| **Kubernetes** | Docker container in K8s | Scalable deployment |

## ⚠️ Limitations

- **Audio quality**: Trained on a limited dataset (~200 samples × 16 speakers) — quality may vary
- **Inference speed**: CPU inference is slower; GPU recommended for production
- **Emotion control**: Basic emotion prefixes are supported but effects are subtle
- **Proper nouns**: May struggle with non-Sinhala words or names
- **Out-of-vocabulary characters**: Limited to the 93-character vocabulary

## 📝 License

This model is released under the **Apache 2.0 License**.

## 🙏 Maintainer

**Death Legion Team** — [🤗 HuggingFace](https://huggingface.co/deathlegionteam)

---

<p align="center">
  <a href="https://huggingface.co/spaces/deathlegionteam/sinhala-tts-demo">🎧 Try the Live Demo</a> •
  <a href="./DEVELOPER_GUIDE.md">📖 Developer Guide</a> •
  <a href="https://huggingface.co/deathlegionteam">🏠 Death Legion Team</a>
</p>