| --- |
| license: mit |
| datasets: |
| - MahtaFetrat/Mana-TTS |
| language: |
| - fa |
| base_model: |
| - rhasspy/piper-voices |
| --- |
| |
| # Mana Persian Piper (fa-IR) |
|
|
| This repository hosts a **Persian (fa-IR) Piper TTS model** trained for **low-latency, high-quality speech synthesis**. |
|
|
| The model is a **medium-sized Piper checkpoint**, fine-tuned on the **Mana-TTS** dataset to produce natural and intelligible Persian speech while remaining suitable for **real-time and on-device inference**. |
|
|
| --- |
|
|
| ## Model Description |
|
|
| * **Architecture:** Piper (medium) |
| * **Language:** Persian (fa-IR) |
| * **Base Checkpoint:** |
| [https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir](https://huggingface.co/SadeghK/persian-text-to-speech/tree/main/farsi/amir) |
| * **Fine-tuning:** |
| ~1000 epochs on Mana-TTS |
| * **Training Dataset:** |
| [https://huggingface.co/datasets/MahtaFetrat/Mana-TTS](https://huggingface.co/datasets/MahtaFetrat/Mana-TTS) |
|
|
| This model was trained as part of a broader effort to build efficient Persian TTS systems that integrate well with **lightweight and context-aware phonemization pipelines**. |
|
|
| --- |
|
|
| ## Inference |
|
|
| ### Install Piper |
|
|
| ```bash |
| pip install piper-tts |
| ``` |
|
|
| ### Download the Model |
|
|
| ```bash |
| git clone https://huggingface.co/MahtaFetrat/Mana-Persian-Piper |
| ``` |
|
|
| ### Run Inference (Python) |
|
|
| ```python |
| import wave |
| from piper import PiperVoice |
| |
| voice = PiperVoice.load("/content/Mana-Persian-Piper/fa_IR-mana-medium.onnx") |
| |
| with wave.open("test.wav", "wb") as wav_file: |
| voice.synthesize_wav("سلام به همگی!", wav_file) |
| ``` |
|
|
| This will generate a `test.wav` file containing synthesized Persian speech. |
|
|
| --- |
|
|
| ## Model Files |
|
|
| * `fa_IR-mana-medium.onnx` – Piper acoustic model |
| * `fa_IR-mana-medium.onnx.json` – Model configuration and metadata |
|
|
| --- |
|
|
| ## Recommended Usage |
|
|
| This model is **best used in conjunction with context-aware phonemization**, as proposed in the paper: |
|
|
| > **Beyond Unified Models: A Service-Oriented Approach to Low-Latency, Context-Aware Phonemization for Real-Time TTS** |
|
|
| In particular, combining this Piper model with: |
|
|
| * Lightweight G2P |
| * Ezafe-aware context disambiguation |
|
|
| results in improved pronunciation accuracy while preserving real-time performance. |
|
|
| The full system implementation is available in the companion repository associated with the paper. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use this model in your research or applications, please cite the following paper: |
|
|
| ```bibtex |
| @misc{fetrat2025servicetts, |
| title={Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS}, |
| author={Mahta Fetrat and Donya Navabi and Zahra Dehghanian and Morteza Abolghasemi and Hamid R. Rabiee}, |
| year={2025}, |
| eprint={2512.08006}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.SD}, |
| url={https://arxiv.org/abs/2512.08006}, |
| } |
| ``` |
|
|
|
|
|
|