whisper-small - Fine-tuned for Ukrainian ASR

This model is a fine-tuned version of openai/whisper-small on the ukr dialects audio dataset for Ukrainian speech recognition.

Model Description

Fine-tune openai/whisper-small on ukr-dialects-audio-dataset

Training Details

Training Data

Property Value
Dataset KSE-RESEARCH-Group/ukr-dialects-audio-dataset
Training samples 27673
Validation samples 3365
Test samples 3451
Language Ukrainian
Max token length 448

Training Hyperparameters

Parameter Value
Base model openai/whisper-small
Learning rate 1e-05
Warmup steps 500
Max steps 5000
Batch size (per device) 16
Gradient accumulation steps 2
Effective batch size 32
FP16 True
Gradient checkpointing False
Eval strategy steps
Eval/Save steps 500
Metric for best model cer

Training Results

The model was trained for 5000 steps with evaluation every 500 steps. The best checkpoint was selected based on the lowest CER.

Step Train Loss Eval Loss Eval CER (%) Eval WER (%)
500 0.538 0.599 17.2 41.97
1000 0.374 0.518 19.39 42.48
1500 0.365 0.486 18.63 40.4
2000 0.23 0.482 15.01 34.86
2500 0.247 0.476 17.21 37.67
3000 0.137 0.498 16.12 35.5
3500 0.099 0.509 14.23 33.22
4000 0.101 0.523 13.91 33.04
4500 0.094 0.54 13.59 32.36
5000 0.081 0.544 13.73 32.45

Best Model Checkpoint: Step 4500

Final Evaluation Metrics

Validation Set

Metric Value
CER 13.73%
WER 32.45%
Eval Loss 0.544

Test Set

Metric Value
CER 12.09%
WER 30.44%

Usage

Using Pipeline (Recommended)

from transformers import pipeline
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    "automatic-speech-recognition",
    model="KSE-RESEARCH-Group/whisper-small-ukr-dialects",
    device=device,
)

result = pipe(
    "path/to/audio.wav",
    generate_kwargs={
        "task": "transcribe",
        "language": "ukrainian",
    },
    chunk_length_s=30,
)
print(result["text"])

Using Transformers Directly

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model_id = "KSE-RESEARCH-Group/whisper-small-ukr-dialects"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Move to GPU if available
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Process audio (audio_array should be a numpy array at 16kHz)
input_features = processor(
    audio_array, 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features.to(device)

# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Infrastructure

Hardware

Component Specification
GPU NVIDIA GeForce RTX 4090
GPU Memory 47.4 GB
GPU Count 1
CUDA Compute Capability 8.9

Environment

Package Version
Python 3.12.12
PyTorch 2.8.0+cu128
CUDA 12.8
Transformers 4.57.3
Datasets 2.21.0
Evaluate 0.4.6

Training Time

Metric Value
Total training time 4:45:41.352520
Training started 2026-03-02 09:51:00
Training completed 2026-03-02 14:36:42

Experiment Details

Property Value
Experiment ID whisper-small-001
WandB Project ukr-dialects-stt
WandB Run whisper-small-001

Citation

If you use this model, please cite:

@misc{KSE-RESEARCH-Group-whisper-small-ukr-dialects,
  author = {KSE-RESEARCH-Group},
  title = {whisper-small - Fine-tuned for Ukrainian ASR},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/KSE-RESEARCH-Group/whisper-small-ukr-dialects}
}

License

This model is released under the Apache 2.0 license.

Acknowledgements

Downloads last month
13
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KSE-RESEARCH-Group/whisper-small-ukr-dialects

Finetuned
(3556)
this model

Dataset used to train KSE-RESEARCH-Group/whisper-small-ukr-dialects