--- {} --- --- base_model: openai/whisper-small datasets: - tarteel-ai/everyayah language: - ar license: apache-2.0 metrics: - wer - cer model_name: NightPrince/stt-arabic-whisper-finetuned-diactires tags: - whisper - automatic-speech-recognition - arabic - quran - tashkeel--- # NightPrince/stt-arabic-whisper-finetuned-diactires Fine-tuned [`openai/whisper-small`](https://huggingface.co/openai/whisper-small) on [`tarteel-ai/everyayah`](https://huggingface.co/datasets/tarteel-ai/everyayah) for **Quranic Arabic Automatic Speech Recognition with full tashkeel (diacritics)**. ## Model Description | Property | Value | |---|---| | Base model | openai/whisper-small (244 M params) | | Language | Arabic (ar) | | Task | Automatic Speech Recognition | | Dataset | tarteel-ai/everyayah | | Output | Arabic text with **full tashkeel preserved** | | Fine-tuning type | Full fine-tuning (not LoRA) | | Precision | fp16 mixed precision | | Hardware | 4× NVIDIA RTX 2080 Ti (44 GB VRAM total) | ## Why Full Fine-Tuning? - **Domain gap**: Quranic Tajweed recitation differs substantially from conversational Arabic - **Tashkeel precision**: All 12 decoder layers need to adapt for reliable diacritic generation - **5 diverse reciters**: Broad acoustic variety prevents reciter-specific overfitting ## Usage ```python from transformers import pipeline pipe = pipeline( "automatic-speech-recognition", model="NightPrince/stt-arabic-whisper-finetuned-diactires", generate_kwargs={"language": "arabic", "task": "transcribe"}, ) result = pipe("your_quran_audio.mp3") print(result["text"]) # → بِسْمِ اللَّهِ الرَّحْمَنِ الرَّحِيمِ ``` ## Training Details | Setting | Value | |---|---| | Learning rate | 1e-05 | | LR scheduler | cosine | | Effective batch size | 8 × 4 × 4 GPUs = 128 | | Max steps | 8000 | | Warmup steps | 500 | | Weight decay | 0.05 | | Dropout | 0.1 | | Early stopping | patience=5 (eval every 500 steps) | | Best model criterion | CER (Character Error Rate with tashkeel) | ## Evaluation Metrics | Metric | Description | |---|---| | `cer` | Char Error Rate — **with** full tashkeel (primary metric) | | `wer` | Word Error Rate — **with** full tashkeel | | `wer_normalized` | Word Error Rate — **without** tashkeel (normalized comparison) | ## Intended Use Transcribing Quranic recitation audio to text with complete harakat (tashkeel). Suitable for Quran learning apps, recitation evaluation, and Islamic education tools. ## Training Data [`tarteel-ai/everyayah`](https://huggingface.co/datasets/tarteel-ai/everyayah) contains verse-level (ayah-level) recordings from multiple Quranic reciters. Text labels contain complete tashkeel from the Uthmani script. Training uses **6 reciters (~24,191 samples total)**: | Reciter | Samples | |---|---| | abdulsamad | ~4,269 | | abdul_basit | ~4,269 | | abdullah_basfar | ~4,269 | | husary | ~4,269 | | menshawi | ~2,846 | | minshawi | ~4,269 |