whisper-small - Fine-tuned for Ukrainian ASR
This model is a fine-tuned version of openai/whisper-small on the ukr dialects audio dataset for Ukrainian speech recognition.
Model Description
Fine-tune openai/whisper-small on ukr-dialects-audio-dataset
Training Details
Training Data
Training Hyperparameters
| Parameter |
Value |
| Base model |
openai/whisper-small |
| Learning rate |
1e-05 |
| Warmup steps |
500 |
| Max steps |
5000 |
| Batch size (per device) |
16 |
| Gradient accumulation steps |
2 |
| Effective batch size |
32 |
| FP16 |
True |
| Gradient checkpointing |
False |
| Eval strategy |
steps |
| Eval/Save steps |
500 |
| Metric for best model |
cer |
Training Results
The model was trained for 5000 steps with evaluation every 500 steps. The best checkpoint was selected based on the lowest CER.
| Step |
Train Loss |
Eval Loss |
Eval CER (%) |
Eval WER (%) |
| 500 |
0.538 |
0.599 |
17.2 |
41.97 |
| 1000 |
0.374 |
0.518 |
19.39 |
42.48 |
| 1500 |
0.365 |
0.486 |
18.63 |
40.4 |
| 2000 |
0.23 |
0.482 |
15.01 |
34.86 |
| 2500 |
0.247 |
0.476 |
17.21 |
37.67 |
| 3000 |
0.137 |
0.498 |
16.12 |
35.5 |
| 3500 |
0.099 |
0.509 |
14.23 |
33.22 |
| 4000 |
0.101 |
0.523 |
13.91 |
33.04 |
| 4500 |
0.094 |
0.54 |
13.59 |
32.36 |
| 5000 |
0.081 |
0.544 |
13.73 |
32.45 |
Best Model Checkpoint: Step 4500
Final Evaluation Metrics
Validation Set
| Metric |
Value |
| CER |
13.73% |
| WER |
32.45% |
| Eval Loss |
0.544 |
Test Set
| Metric |
Value |
| CER |
12.09% |
| WER |
30.44% |
Usage
Using Pipeline (Recommended)
from transformers import pipeline
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
"automatic-speech-recognition",
model="KSE-RESEARCH-Group/whisper-small-ukr-dialects",
device=device,
)
result = pipe(
"path/to/audio.wav",
generate_kwargs={
"task": "transcribe",
"language": "ukrainian",
},
chunk_length_s=30,
)
print(result["text"])
Using Transformers Directly
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch
model_id = "KSE-RESEARCH-Group/whisper-small-ukr-dialects"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = model.to(device)
input_features = processor(
audio_array,
sampling_rate=16000,
return_tensors="pt"
).input_features.to(device)
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Infrastructure
Hardware
| Component |
Specification |
| GPU |
NVIDIA GeForce RTX 4090 |
| GPU Memory |
47.4 GB |
| GPU Count |
1 |
| CUDA Compute Capability |
8.9 |
Environment
| Package |
Version |
| Python |
3.12.12 |
| PyTorch |
2.8.0+cu128 |
| CUDA |
12.8 |
| Transformers |
4.57.3 |
| Datasets |
2.21.0 |
| Evaluate |
0.4.6 |
Training Time
| Metric |
Value |
| Total training time |
4:45:41.352520 |
| Training started |
2026-03-02 09:51:00 |
| Training completed |
2026-03-02 14:36:42 |
Experiment Details
| Property |
Value |
| Experiment ID |
whisper-small-001 |
| WandB Project |
ukr-dialects-stt |
| WandB Run |
whisper-small-001 |
Citation
If you use this model, please cite:
@misc{KSE-RESEARCH-Group-whisper-small-ukr-dialects,
author = {KSE-RESEARCH-Group},
title = {whisper-small - Fine-tuned for Ukrainian ASR},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/KSE-RESEARCH-Group/whisper-small-ukr-dialects}
}
License
This model is released under the Apache 2.0 license.
Acknowledgements