---
language:
- ar
base_model:
- SWivid/F5-TTS
pipeline_tag: text-to-speech
tags:
- speech
- arabic
license: fair-noncommercial-research-license
---
# F5-TTS Arabic
## نموذج تحويل النص إلى كلام باللغة العربية

Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.

## Model Details

**Base Model:** [F5-TTS](https://github.com/SWivid/F5-TTS)  
**Training Data:** ~300 hours of clean Arabic audio  
**Language:** Modern Standard Arabic (MSA)  

## Usage

### Quick Start

for infernce with text chunking see the [Colab notebook](https://colab.research.google.com/drive/1udTmXSarrw9upSVp3tV5xl0Ki-2bTnhU?usp=sharing).


```python
from huggingface_hub import hf_hub_download

# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")

# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
  --model_cfg "{config_file}" \
  --output_file "./output.wav" \
  --model "F5TTS_Base" \
  --ckpt_file "{ckpt_file}" \
  --vocab_file "{vocab_file}" \
  --ref_audio "{ref_audio}" \
  --nfe_step 32 \
  --cfg_strength 1.8 \
  --ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
  --gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
  --speed 0.9
```

## Key Features

- High-quality Arabic speech synthesis
- Consistent voice cloning from reference audio
- Works best with moderate text lengths (chunking recommended for long texts)
- Supports speed adjustment
- Fine-tunable for specific use cases

## Input Requirements

**Critical:** Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.

Example of correct input:
```
إِنَّ الْعِلْمَ نُورٌ يُقْذَفُ فِي الْقَلْبِ
```

## Sample Output

**Text:** إِنَّ الْعِلْمَ لَيْسَ بِكَثْرَةِ الرِّوَايَةِ، وَإِنَّمَا هُوَ نُورٌ يُقْذَفُ فِي الْقَلْبِ، يَفْهَمُ بِهِ الْعَبْدُ حَقَائِقَ الْأُمُورِ. وَالْحِكْمَةُ ضَالَّةُ الْمُؤْمِنِ، فَحَيْثُمَا وَجَدَهَا فَهُوَ أَحَقُّ بِهَا. وَمَنْ طَلَبَ الْعُلَا مِنْ غَيْرِ كَدٍّ، أَضَاعَ الْعُمُرَ فِي طَلَبِ الْمُحَالِ. فَاصْبِرْ عَلَى مُرِّ الْحَقِّ، وَلَا تَسْتَعْجِلْ قَطْفَ الثَّمَرَةِ قَبْلَ نُضْجِهَا، فَإِنَّ لِكُلِّ شَيْءٍ أَوَانًا، وَلِكُلِّ مَقَامٍ مَقَالًا.

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/645098004f731658826cfe57/OOaZF_Sn0WJ_hw3mX68NR.wav"></audio>

## refernce 

<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/645098004f731658826cfe57/2sx0msKFUqHM-1MzRXuiD.wav"></audio>

## Further Fine-tuning

The model can be further fine-tuned for:
- Non-diacritized text (requires additional training)
- Specific voice characteristics
- Domain-specific vocabulary
- Dialectal variations

## License

This model is released under a **Non-Commercial License**. 

- You may use this model for research, educational, and personal non-commercial purposes.
- Commercial use is strictly prohibited without explicit permission.
- If you wish to use this model for commercial purposes, please contact the model author.


## Limitations

- Requires fully diacritized Arabic text as input
- Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
- Performance may vary with very long texts without chunking
- Voice cloning quality depends on reference audio quality and length