whisper-small - Fine-tuned for Ukrainian ASR

This model is a fine-tuned version of openai/whisper-small on the ukr dialects audio dataset for Ukrainian speech recognition.

Model Description

Fine-tune openai/whisper-small on ukr-dialects-audio-dataset

Training Details

Training Data

Property	Value
Dataset	KSE-RESEARCH-Group/ukr-dialects-audio-dataset
Training samples	27673
Validation samples	3365
Test samples	3451
Language	Ukrainian
Max token length	448

Training Hyperparameters

Parameter	Value
Base model	openai/whisper-small
Learning rate	1e-05
Warmup steps	500
Max steps	5000
Batch size (per device)	16
Gradient accumulation steps	2
Effective batch size	32
FP16	True
Gradient checkpointing	False
Eval strategy	steps
Eval/Save steps	500
Metric for best model	cer

Training Results

The model was trained for 5000 steps with evaluation every 500 steps. The best checkpoint was selected based on the lowest CER.

Step	Train Loss	Eval Loss	Eval CER (%)	Eval WER (%)
500	0.538	0.599	17.2	41.97
1000	0.374	0.518	19.39	42.48
1500	0.365	0.486	18.63	40.4
2000	0.23	0.482	15.01	34.86
2500	0.247	0.476	17.21	37.67
3000	0.137	0.498	16.12	35.5
3500	0.099	0.509	14.23	33.22
4000	0.101	0.523	13.91	33.04
4500	0.094	0.54	13.59	32.36
5000	0.081	0.544	13.73	32.45

Best Model Checkpoint: Step 4500

Final Evaluation Metrics

Validation Set

Metric	Value
CER	13.73%
WER	32.45%
Eval Loss	0.544

Test Set

Metric	Value
CER	12.09%
WER	30.44%

Usage

Using Pipeline (Recommended)

from transformers import pipeline
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    "automatic-speech-recognition",
    model="KSE-RESEARCH-Group/whisper-small-ukr-dialects",
    device=device,
)

result = pipe(
    "path/to/audio.wav",
    generate_kwargs={
        "task": "transcribe",
        "language": "ukrainian",
    },
    chunk_length_s=30,
)
print(result["text"])

Using Transformers Directly

from transformers import WhisperForConditionalGeneration, WhisperProcessor
import torch

model_id = "KSE-RESEARCH-Group/whisper-small-ukr-dialects"

processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Move to GPU if available
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = model.to(device)

# Process audio (audio_array should be a numpy array at 16kHz)
input_features = processor(
    audio_array, 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features.to(device)

# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Infrastructure

Hardware

Component	Specification
GPU	NVIDIA GeForce RTX 4090
GPU Memory	47.4 GB
GPU Count	1
CUDA Compute Capability	8.9

Environment

Package	Version
Python	3.12.12
PyTorch	2.8.0+cu128
CUDA	12.8
Transformers	4.57.3
Datasets	2.21.0
Evaluate	0.4.6

Training Time

Metric	Value
Total training time	4:45:41.352520
Training started	2026-03-02 09:51:00
Training completed	2026-03-02 14:36:42

Experiment Details

Property	Value
Experiment ID	whisper-small-001
WandB Project	ukr-dialects-stt
WandB Run	whisper-small-001

Citation

If you use this model, please cite:

@misc{KSE-RESEARCH-Group-whisper-small-ukr-dialects,
  author = {KSE-RESEARCH-Group},
  title = {whisper-small - Fine-tuned for Ukrainian ASR},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/KSE-RESEARCH-Group/whisper-small-ukr-dialects}
}

License

This model is released under the Apache 2.0 license.

Acknowledgements

Base model: openai/whisper-small
Dataset: ukr dialects audio dataset
Training infrastructure: NVIDIA GeForce RTX 4090

Downloads last month: 13

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for KSE-RESEARCH-Group/whisper-small-ukr-dialects

Base model

openai/whisper-small

Finetuned

(3556)

this model

KSE-RESEARCH-Group
/

whisper-small-ukr-dialects