File size: 2,844 Bytes

cafdcbd

---
license: mit
datasets:
- MahtaFetrat/Mana-TTS
language:
- fa
base_model:
- rhasspy/piper-voices
---

# Mana Persian Piper (fa-IR)

This repository hosts a **Persian (fa-IR) Piper TTS model** trained for **low-latency, high-quality speech synthesis**.

The model is a **medium-sized Piper checkpoint**, fine-tuned on the **Mana-TTS** dataset to produce natural and intelligible Persian speech while remaining suitable for **real-time and on-device inference**.

---

## Model Description

* **Architecture:** Piper (medium)
* **Language:** Persian (fa-IR)
* **Base Checkpoint:**
  [https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir](https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir)
* **Fine-tuning:**
  ~1000 epochs on Mana-TTS
* **Training Dataset:**
  [https://huggingface.co/datasets/MahtaFetrat/Mana-TTS](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS)

This model was trained as part of a broader effort to build efficient Persian TTS systems that integrate well with **lightweight and context-aware phonemization pipelines**.

---

## Inference

### Install Piper

```bash
pip install piper-tts
```

### Download the Model

```bash
git clone https://huggingface.co/MahtaFetrat/Mana-Persian-Piper
```

### Run Inference (Python)

```python
import wave
from piper import PiperVoice

voice = PiperVoice.load("/content/Mana-Persian-Piper/fa_IR-mana-medium.onnx")

with wave.open("test.wav", "wb") as wav_file:
    voice.synthesize_wav("سلام به همگی!", wav_file)
```

This will generate a `test.wav` file containing synthesized Persian speech.

---

## Model Files

* `fa_IR-mana-medium.onnx` – Piper acoustic model
* `fa_IR-mana-medium.onnx.json` – Model configuration and metadata

---

## Recommended Usage

This model is **best used in conjunction with context-aware phonemization**, as proposed in the paper:

> **Beyond Unified Models: A Service-Oriented Approach to Low-Latency, Context-Aware Phonemization for Real-Time TTS**

In particular, combining this Piper model with:

* Lightweight G2P
* Ezafe-aware context disambiguation

results in improved pronunciation accuracy while preserving real-time performance.

The full system implementation is available in the companion repository associated with the paper.

---

## Citation

If you use this model in your research or applications, please cite the following paper:

```bibtex
@misc{fetrat2025servicetts,
      title={Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS}, 
      author={Mahta Fetrat and Donya Navabi and Zahra Dehghanian and Morteza Abolghasemi and Hamid R. Rabiee},
      year={2025},
      eprint={2512.08006},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2512.08006}, 
}
```