Automatic Speech Recognition
Transformers
PyTorch
Serbian
wav2vec2
mozilla-foundation/common_voice_8_0
Generated from Trainer
robust-speech-event
xlsr-fine-tuning-week
hf-asr-leaderboard
Eval Results (legacy)
Instructions to use comodoro/wav2vec2-xls-r-300m-sr-cv8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use comodoro/wav2vec2-xls-r-300m-sr-cv8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="comodoro/wav2vec2-xls-r-300m-sr-cv8")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("comodoro/wav2vec2-xls-r-300m-sr-cv8") model = AutoModelForCTC.from_pretrained("comodoro/wav2vec2-xls-r-300m-sr-cv8") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - sr | |
| license: apache-2.0 | |
| tags: | |
| - automatic-speech-recognition | |
| - mozilla-foundation/common_voice_8_0 | |
| - generated_from_trainer | |
| - robust-speech-event | |
| - xlsr-fine-tuning-week | |
| - hf-asr-leaderboard | |
| datasets: | |
| - mozilla-foundation/common_voice_8_0 | |
| - name: Serbian comodoro Wav2Vec2 XLSR 300M CV8 | |
| results: | |
| - task: | |
| name: Automatic Speech Recognition | |
| type: automatic-speech-recognition | |
| dataset: | |
| name: Common Voice 8 | |
| type: mozilla-foundation/common_voice_8_0 | |
| args: sr | |
| metrics: | |
| - name: Test WER | |
| type: wer | |
| value: 48.5 | |
| - name: Test CER | |
| type: cer | |
| value: 18.4 | |
| model-index: | |
| - name: wav2vec2-xls-r-300m-sr-cv8 | |
| results: | |
| - task: | |
| name: Automatic Speech Recognition | |
| type: automatic-speech-recognition | |
| dataset: | |
| name: Common Voice 8.0 | |
| type: mozilla-foundation/common_voice_8_0 | |
| args: sr | |
| metrics: | |
| - name: Test WER | |
| type: wer | |
| value: 48.53 | |
| - task: | |
| name: Automatic Speech Recognition | |
| type: automatic-speech-recognition | |
| dataset: | |
| name: Robust Speech Event - Dev Data | |
| type: speech-recognition-community-v2/dev_data | |
| args: sr | |
| metrics: | |
| - name: Test WER | |
| type: wer | |
| value: 97.43 | |
| - task: | |
| name: Automatic Speech Recognition | |
| type: automatic-speech-recognition | |
| dataset: | |
| name: Robust Speech Event - Test Data | |
| type: speech-recognition-community-v2/eval_data | |
| args: sr | |
| metrics: | |
| - name: Test WER | |
| type: wer | |
| value: 96.69 | |
| # Serbian wav2vec2-xls-r-300m-sr-cv8 | |
| This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice dataset. | |
| It achieves the following results on the evaluation set: | |
| - Loss: 1.7302 | |
| - Wer: 0.4825 | |
| - Cer: 0.1847 | |
| Evaluation on mozilla-foundation/common_voice_8_0 gave the following results: | |
| - WER: 0.48530097993467103 | |
| - CER: 0.18413288165227845 | |
| Evaluation on speech-recognition-community-v2/dev_data gave the following results: | |
| - WER: 0.9718373107518604 | |
| - CER: 0.8302740620263108 | |
| The model can be evaluated using the attached `eval.py` script: | |
| ``` | |
| python eval.py --model_id comodoro/wav2vec2-xls-r-300m-sr-cv8 --dataset mozilla-foundation/common-voice_8_0 --split test --config sr | |
| ``` | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 0.0001 | |
| - train_batch_size: 16 | |
| - eval_batch_size: 8 | |
| - seed: 42 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: linear | |
| - lr_scheduler_warmup_steps: 300 | |
| - num_epochs: 800 | |
| - mixed_precision_training: Native AMP | |
| ### Training results | |
| | Training Loss | Epoch | Step | Validation Loss | Wer | Cer | | |
| |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:| | |
| | 5.6536 | 15.0 | 1200 | 2.9744 | 1.0 | 1.0 | | |
| | 2.7935 | 30.0 | 2400 | 1.6613 | 0.8998 | 0.4670 | | |
| | 1.6538 | 45.0 | 3600 | 0.9248 | 0.6918 | 0.2699 | | |
| | 1.2446 | 60.0 | 4800 | 0.9151 | 0.6452 | 0.2398 | | |
| | 1.0766 | 75.0 | 6000 | 0.9110 | 0.5995 | 0.2207 | | |
| | 0.9548 | 90.0 | 7200 | 1.0273 | 0.5921 | 0.2149 | | |
| | 0.8919 | 105.0 | 8400 | 0.9929 | 0.5646 | 0.2117 | | |
| | 0.8185 | 120.0 | 9600 | 1.0850 | 0.5483 | 0.2069 | | |
| | 0.7692 | 135.0 | 10800 | 1.1001 | 0.5394 | 0.2055 | | |
| | 0.7249 | 150.0 | 12000 | 1.1018 | 0.5380 | 0.1958 | | |
| | 0.6786 | 165.0 | 13200 | 1.1344 | 0.5114 | 0.1941 | | |
| | 0.6432 | 180.0 | 14400 | 1.1516 | 0.5054 | 0.1905 | | |
| | 0.6009 | 195.0 | 15600 | 1.3149 | 0.5324 | 0.1991 | | |
| | 0.5773 | 210.0 | 16800 | 1.2468 | 0.5124 | 0.1903 | | |
| | 0.559 | 225.0 | 18000 | 1.2186 | 0.4956 | 0.1922 | | |
| | 0.5298 | 240.0 | 19200 | 1.4483 | 0.5333 | 0.2085 | | |
| | 0.5136 | 255.0 | 20400 | 1.2871 | 0.4802 | 0.1846 | | |
| | 0.4824 | 270.0 | 21600 | 1.2891 | 0.4974 | 0.1885 | | |
| | 0.4669 | 285.0 | 22800 | 1.3283 | 0.4942 | 0.1878 | | |
| | 0.4511 | 300.0 | 24000 | 1.4502 | 0.5002 | 0.1994 | | |
| | 0.4337 | 315.0 | 25200 | 1.4714 | 0.5035 | 0.1911 | | |
| | 0.4221 | 330.0 | 26400 | 1.4971 | 0.5124 | 0.1962 | | |
| | 0.3994 | 345.0 | 27600 | 1.4473 | 0.5007 | 0.1920 | | |
| | 0.3892 | 360.0 | 28800 | 1.3904 | 0.4937 | 0.1887 | | |
| | 0.373 | 375.0 | 30000 | 1.4971 | 0.4946 | 0.1902 | | |
| | 0.3657 | 390.0 | 31200 | 1.4208 | 0.4900 | 0.1821 | | |
| | 0.3559 | 405.0 | 32400 | 1.4648 | 0.4895 | 0.1835 | | |
| | 0.3476 | 420.0 | 33600 | 1.4848 | 0.4946 | 0.1829 | | |
| | 0.3276 | 435.0 | 34800 | 1.5597 | 0.4979 | 0.1873 | | |
| | 0.3193 | 450.0 | 36000 | 1.7329 | 0.5040 | 0.1980 | | |
| | 0.3078 | 465.0 | 37200 | 1.6379 | 0.4937 | 0.1882 | | |
| | 0.3058 | 480.0 | 38400 | 1.5878 | 0.4942 | 0.1921 | | |
| | 0.2987 | 495.0 | 39600 | 1.5590 | 0.4811 | 0.1846 | | |
| | 0.2931 | 510.0 | 40800 | 1.6001 | 0.4825 | 0.1849 | | |
| | 0.276 | 525.0 | 42000 | 1.7388 | 0.4942 | 0.1918 | | |
| | 0.2702 | 540.0 | 43200 | 1.7037 | 0.4839 | 0.1866 | | |
| | 0.2619 | 555.0 | 44400 | 1.6704 | 0.4755 | 0.1840 | | |
| | 0.262 | 570.0 | 45600 | 1.6042 | 0.4751 | 0.1865 | | |
| | 0.2528 | 585.0 | 46800 | 1.6402 | 0.4821 | 0.1865 | | |
| | 0.2442 | 600.0 | 48000 | 1.6693 | 0.4886 | 0.1862 | | |
| | 0.244 | 615.0 | 49200 | 1.6203 | 0.4765 | 0.1792 | | |
| | 0.2388 | 630.0 | 50400 | 1.6829 | 0.4830 | 0.1828 | | |
| | 0.2362 | 645.0 | 51600 | 1.8100 | 0.4928 | 0.1888 | | |
| | 0.2224 | 660.0 | 52800 | 1.7746 | 0.4932 | 0.1899 | | |
| | 0.2218 | 675.0 | 54000 | 1.7752 | 0.4946 | 0.1901 | | |
| | 0.2201 | 690.0 | 55200 | 1.6775 | 0.4788 | 0.1844 | | |
| | 0.2147 | 705.0 | 56400 | 1.7085 | 0.4844 | 0.1851 | | |
| | 0.2103 | 720.0 | 57600 | 1.7624 | 0.4848 | 0.1864 | | |
| | 0.2101 | 735.0 | 58800 | 1.7213 | 0.4783 | 0.1835 | | |
| | 0.1983 | 750.0 | 60000 | 1.7452 | 0.4848 | 0.1856 | | |
| | 0.2015 | 765.0 | 61200 | 1.7525 | 0.4872 | 0.1869 | | |
| | 0.1969 | 780.0 | 62400 | 1.7443 | 0.4844 | 0.1852 | | |
| | 0.2043 | 795.0 | 63600 | 1.7302 | 0.4825 | 0.1847 | | |
| ### Framework versions | |
| - Transformers 4.16.2 | |
| - Pytorch 1.10.1+cu102 | |
| - Datasets 1.18.3 | |
| - Tokenizers 0.11.0 | |