Whisper Small Hy 2 - Erik Mkrtchyan

This model is a fine-tuned version of openai/whisper-small on the Hy Generated Audio Data dataset. It achieves the following results on the evaluation set:

Loss: 0.0999
Wer: 22.7854

Model description

This model is based on OpenAI's Whisper Small and fine-tuned for Armenian using a combination of real and synthetic audio data. It is designed to transcribe Armenian speech into text.

Intended uses & limitations

Intended Uses:

Armenian speech-to-text applications
Research on ASR for low-resource languages
Educational and experimental projects involving Whisper models

Limitations:

May not generalize well to accents or noisy audio not represented in the training set
he model may hallucinate text or produce inaccurate transcriptions, especially on unusual or out-of-distribution inputs, due to the inclusion of TTS-generated synthetic data in training.

Training and evaluation data

The dataset contains both real and high-quality synthetic Armenian speech clips.

Split(1)	# Clips	Duration (hours)
`train`	9,300	13.53
`test`	5,818	9.16
`eval`	5,856	8.76
`generated`	100,000	113.61
`generated[2]`	137,419	173.76

Total duration: ~318 hours
Train set duration(train+generated#1+generated#2: ~300 hours
Test set duration(test+eval) ~18 hours

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 3
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0516	0.4999	7709	0.1417	33.0858
0.0366	0.9999	15418	0.1139	27.4340
0.0275	1.4998	23127	0.1057	25.0415
0.0308	1.9997	30836	0.0981	23.7545
0.017	2.4997	38545	0.1016	23.2408
0.019	2.9996	46254	0.0999	22.7854

Framework versions

Transformers 4.51.3
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 6

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for ErikMkrtchyan/whisper-small-hy-2

Base model

openai/whisper-small

Quantized

(33)

this model

Quantizations

1 model

Datasets used to train ErikMkrtchyan/whisper-small-hy-2

Evaluation results

Wer on Hy Generated Audio Data with CV 20.0
self-reported

22.785