google/fleurs
Viewer • Updated • 768k • 63.9k • 406
How to use farsipal/whisper-lg-el-intlv-xs-2 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="farsipal/whisper-lg-el-intlv-xs-2") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("farsipal/whisper-lg-el-intlv-xs-2")
model = AutoModelForSpeechSeq2Seq.from_pretrained("farsipal/whisper-lg-el-intlv-xs-2")This model is a fine-tuned version of farsipal/whisper-lg-el-intlv-xs on the mozilla-foundation/common_voice_11_0,google/fleurs el,el_gr dataset. It achieves the following results on the evaluation set:
The model was trained on two interleaved datasets for transcription in the Greek language.
Transcription in the Greek language
Training was performed on two interleaved datasets. Testing was performed on common voice 11.0 (el) test only.
--model_name_or_path 'farsipal/whisper-lg-el-intlv-xs' \
--model_revision main \
--do_train True \
--do_eval True \
--use_auth_token False \
--freeze_feature_encoder False \
--freeze_encoder False \
--model_index_name 'whisper-lg-el-intlv-xs-2' \
--dataset_name 'mozilla-foundation/common_voice_11_0,google/fleurs' \
--dataset_config_name 'el,el_gr' \
--train_split_name 'train+validation,train+validation' \
--eval_split_name 'test,-' \
--text_column_name 'sentence,transcription' \
--audio_column_name 'audio,audio' \
--streaming False \
--max_duration_in_seconds 30 \
--do_lower_case False \
--do_remove_punctuation False \
--do_normalize_eval True \
--language greek \
--task transcribe \
--shuffle_buffer_size 500 \
--output_dir './data/finetuningRuns/whisper-lg-el-intlv-xs-2' \
--overwrite_output_dir True \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--learning_rate 3.5e-6 \
--dropout 0.15 \
--attention_dropout 0.05 \
--warmup_steps 500 \
--max_steps 5000 \
--eval_steps 1000 \
--gradient_checkpointing True \
--cache_dir '~/.cache' \
--fp16 True \
--evaluation_strategy steps \
--per_device_eval_batch_size 8 \
--predict_with_generate True \
--generation_max_length 225 \
--save_steps 1000 \
--logging_steps 25 \
--report_to tensorboard \
--load_best_model_at_end True \
--metric_for_best_model wer \
--greater_is_better False \
--push_to_hub False \
--dataloader_num_workers 6
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 0.0813 | 2.49 | 1000 | 0.2147 | 10.8284 |
| 0.0379 | 4.98 | 2000 | 0.2439 | 10.0111 |
| 0.0195 | 7.46 | 3000 | 0.2767 | 9.8811 |
| 0.0126 | 9.95 | 4000 | 0.2872 | 9.5004 |
| 0.0103 | 12.44 | 5000 | 0.3021 | 9.6954 |