File size: 2,844 Bytes
cafdcbd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | ---
license: mit
datasets:
- MahtaFetrat/Mana-TTS
language:
- fa
base_model:
- rhasspy/piper-voices
---
# Mana Persian Piper (fa-IR)
This repository hosts a **Persian (fa-IR) Piper TTS model** trained for **low-latency, high-quality speech synthesis**.
The model is a **medium-sized Piper checkpoint**, fine-tuned on the **Mana-TTS** dataset to produce natural and intelligible Persian speech while remaining suitable for **real-time and on-device inference**.
---
## Model Description
* **Architecture:** Piper (medium)
* **Language:** Persian (fa-IR)
* **Base Checkpoint:**
[https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir](https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir)
* **Fine-tuning:**
~1000 epochs on Mana-TTS
* **Training Dataset:**
[https://huggingface.co/datasets/MahtaFetrat/Mana-TTS](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS)
This model was trained as part of a broader effort to build efficient Persian TTS systems that integrate well with **lightweight and context-aware phonemization pipelines**.
---
## Inference
### Install Piper
```bash
pip install piper-tts
```
### Download the Model
```bash
git clone https://huggingface.co/MahtaFetrat/Mana-Persian-Piper
```
### Run Inference (Python)
```python
import wave
from piper import PiperVoice
voice = PiperVoice.load("/content/Mana-Persian-Piper/fa_IR-mana-medium.onnx")
with wave.open("test.wav", "wb") as wav_file:
voice.synthesize_wav("سلام به همگی!", wav_file)
```
This will generate a `test.wav` file containing synthesized Persian speech.
---
## Model Files
* `fa_IR-mana-medium.onnx` – Piper acoustic model
* `fa_IR-mana-medium.onnx.json` – Model configuration and metadata
---
## Recommended Usage
This model is **best used in conjunction with context-aware phonemization**, as proposed in the paper:
> **Beyond Unified Models: A Service-Oriented Approach to Low-Latency, Context-Aware Phonemization for Real-Time TTS**
In particular, combining this Piper model with:
* Lightweight G2P
* Ezafe-aware context disambiguation
results in improved pronunciation accuracy while preserving real-time performance.
The full system implementation is available in the companion repository associated with the paper.
---
## Citation
If you use this model in your research or applications, please cite the following paper:
```bibtex
@misc{fetrat2025servicetts,
title={Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS},
author={Mahta Fetrat and Donya Navabi and Zahra Dehghanian and Morteza Abolghasemi and Hamid R. Rabiee},
year={2025},
eprint={2512.08006},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2512.08006},
}
```
|