whisper-small-sr / README.md
istomin9192's picture
Update README.md
1bee0fd verified
---
license: apache-2.0
language:
- sr
base_model:
- openai/whisper-small
datasets:
- google/fleurs
- Sagicc/audio-lmb-ds
- espnet/yodas_owsmv4
- classla/ParlaSpeech-RS
metrics:
- wer
model-index:
- name: Whisper Small
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: Common Voice 24.0
type: mozilla-foundation/common_voice_24_0
config: sr
split: test
args: sr
metrics:
- name: Wer
type: wer
value: 0.065924219787
library_name: transformers
---
# whisper-small-sr
Fine-tuned **OpenAI Whisper Small**.
**Output script:** this model is intended to produce **Serbian Latin** only.
- **WER** on Common Voice 24.0 Serbian test: **6.59%**
## Model description
## Training and evaluation data
This model was fine-tuned on a **mixture of publicly available Serbian speech corpora**, including:
- Mozilla Common Voice 24.0, evaluated on **CV test (sr)**
- FLEURS Serbian
- ParlaSpeech-RS (subset of the full dataset)
- Additional Serbian corpora used in the training pipeline
## Training procedure
- Epochs: 9
- Batch size: 32 / 20
- Optimizer: AdamW
- LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
- Mixed precision: bfloat16 (fp32 in the final epoch)
- SpecAugment: frequency + time masking
- Sampling: weighted sampling across datasets
### Training results
| Epoch | Train loss | CV WER |
|------:|------------------:|-------:|
| 1 | 0.333 | 0.1614 |
| 2 | 0.344 | 0.1278 |
| 3 | 0.251 | 0.1112 |
| 4 | 0.202 | 0.1032 |
| 5 | 0.167 | 0.0934 |
| 6 | 0.138 | 0.0790 |
| 7 | 0.118 | 0.0740 |
| 8 | 0.103 | 0.0709 |
| 9 | 0.096 | 0.0659 |
## Evaluation Metrics
- **WER (normalized)** on **Common Voice 24.0 Serbian test**: **7.09%**
- Text normalization used for WER:
- punctuation removed
- lowercased
- Cyrillic → Latin conversion
- numbers converted to words