whisper-lg-el-intlv-xs-2

This model is a fine-tuned version of farsipal/whisper-lg-el-intlv-xs on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset. It achieves the following results on the evaluation set:

Loss: 0.2872
Wer: 9.5004

Model description

The model was trained on two interleaved datasets for transcription in the Greek language.

Intended uses & limitations

Transcription in the Greek language

Training and evaluation data

Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.

Training procedure

                --model_name_or_path   'farsipal/whisper-lg-el-intlv-xs' \
                --model_revision   main \
                --do_train   True \
                --do_eval   True \
                --use_auth_token   False \
                --freeze_feature_encoder   False \
                --freeze_encoder   False \
                --model_index_name   'whisper-lg-el-intlv-xs-2' \
                --dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
                --dataset_config_name 'el,el_gr' \
                --train_split_name  'train+validation,train+validation' \
                --eval_split_name   'test,-' \
                --text_column_name  'sentence,transcription' \
                --audio_column_name 'audio,audio' \
                --streaming   False \
                --max_duration_in_seconds   30 \
                --do_lower_case   False \
                --do_remove_punctuation   False \
                --do_normalize_eval   True \
                --language   greek \
                --task transcribe \
                --shuffle_buffer_size   500 \
                --output_dir   './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
                --overwrite_output_dir   True \
                --per_device_train_batch_size   8 \
                --gradient_accumulation_steps  4 \
                --learning_rate   3.5e-6 \
                --dropout         0.15 \
                --attention_dropout 0.05 \
                --warmup_steps   500 \
                --max_steps   5000 \
                --eval_steps   1000 \
                --gradient_checkpointing   True \
                --cache_dir   '~/.cache' \
                --fp16   True \
                --evaluation_strategy   steps \
                --per_device_eval_batch_size   8 \
                --predict_with_generate   True \
                --generation_max_length   225 \
                --save_steps   1000 \
                --logging_steps   25 \
                --report_to   tensorboard \
                --load_best_model_at_end   True \
                --metric_for_best_model   wer \
                --greater_is_better   False \
                --push_to_hub   False  \
                --dataloader_num_workers 6

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3.5e-06
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0813	2.49	1000	0.2147	10.8284
0.0379	4.98	2000	0.2439	10.0111
0.0195	7.46	3000	0.2767	9.8811
0.0126	9.95	4000	0.2872	9.5004
0.0103	12.44	5000	0.3021	9.6954

Framework versions

Transformers 4.26.0.dev0
Pytorch 1.13.0+cu117
Datasets 2.8.1.dev0
Tokenizers 0.13.2

Downloads last month: 3

Dataset used to train farsipal/whisper-lg-el-intlv-xs-2

Evaluation results

Wer on mozilla-foundation/common_voice_11_0 el
test set self-reported

9.500