---
{}
---
---
base_model: openai/whisper-small
datasets:
- tarteel-ai/everyayah
language:
- ar
license: apache-2.0
metrics:
- wer
- cer
model_name: NightPrince/stt-arabic-whisper-finetuned-diactires
tags:
- whisper
- automatic-speech-recognition
- arabic
- quran
- tashkeel---

# NightPrince/stt-arabic-whisper-finetuned-diactires

Fine-tuned [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) on
[`tarteel-ai/everyayah`](https://huggingface.co/datasets/tarteel-ai/everyayah) for
**Quranic Arabic Automatic Speech Recognition with full tashkeel (diacritics)**.

## Model Description

| Property | Value |
|---|---|
| Base model | openai/whisper-small (244 M params) |
| Language | Arabic (ar) |
| Task | Automatic Speech Recognition |
| Dataset | tarteel-ai/everyayah |
| Output | Arabic text with **full tashkeel preserved** |
| Fine-tuning type | Full fine-tuning (not LoRA) |
| Precision | fp16 mixed precision |
| Hardware | 4× NVIDIA RTX 2080 Ti (44 GB VRAM total) |

## Why Full Fine-Tuning?

- **Domain gap**: Quranic Tajweed recitation differs substantially from conversational Arabic
- **Tashkeel precision**: All 12 decoder layers need to adapt for reliable diacritic generation
- **5 diverse reciters**: Broad acoustic variety prevents reciter-specific overfitting

## Usage

```python
from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="NightPrince/stt-arabic-whisper-finetuned-diactires",
    generate_kwargs={"language": "arabic", "task": "transcribe"},
)

result = pipe("your_quran_audio.mp3")
print(result["text"])
# → بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ
```

## Training Details

| Setting | Value |
|---|---|
| Learning rate | 1e-05 |
| LR scheduler | cosine |
| Effective batch size | 8 × 4 × 4 GPUs = 128 |
| Max steps | 8000 |
| Warmup steps | 500 |
| Weight decay | 0.05 |
| Dropout | 0.1 |
| Early stopping | patience=5 (eval every 500 steps) |
| Best model criterion | CER (Character Error Rate with tashkeel) |

## Evaluation Metrics

| Metric | Description |
|---|---|
| `cer` | Char Error Rate — **with** full tashkeel (primary metric) |
| `wer` | Word Error Rate — **with** full tashkeel |
| `wer_normalized` | Word Error Rate — **without** tashkeel (normalized comparison) |

## Intended Use

Transcribing Quranic recitation audio to text with complete harakat (tashkeel).
Suitable for Quran learning apps, recitation evaluation, and Islamic education tools.

## Training Data

[`tarteel-ai/everyayah`](https://huggingface.co/datasets/tarteel-ai/everyayah) contains
verse-level (ayah-level) recordings from multiple Quranic reciters.
Text labels contain complete tashkeel from the Uthmani script.

Training uses **6 reciters (~24,191 samples total)**:

| Reciter | Samples |
|---|---|
| abdulsamad | ~4,269 |
| abdul_basit | ~4,269 |
| abdullah_basfar | ~4,269 |
| husary | ~4,269 |
| menshawi | ~2,846 |
| minshawi | ~4,269 |