istomin9192
/

whisper-small-sr

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

istomin9192 commited on Feb 22

Commit

9ad9647

·

verified ·

1 Parent(s): dc34af0

Update README.md

Files changed (1) hide show

README.md +13 -11

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ model-index:
     metrics:
     - name: Wer
       type: wer
-      value: 0.0709
 library_name: transformers
 ---
@@ -53,11 +53,11 @@ This model was fine-tuned on a **mixture of publicly available Serbian speech co
 ## Training procedure
-- Epochs: 8
-- Batch size: 32
 - Optimizer: AdamW
 - LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
-- Mixed precision: bfloat16
 - SpecAugment: frequency + time masking
 - Sampling: weighted sampling across datasets
@@ -65,13 +65,15 @@ This model was fine-tuned on a **mixture of publicly available Serbian speech co
 | Epoch | Train loss | CV WER |
 |------:|------------------:|-------:|
-| 1 | 0.331 | 0.1562 |
-| 2 | 0.338 | 0.1202 |
-| 3 | 0.241 | 0.1062 |
-| 4 | 0.187 | 0.0913 |
-| 5 | 0.150 | 0.0853 |
-| 6 | 0.122 | 0.0745 |
-| 7 | 0.106 | 0.0709 |
 ## Evaluation Metrics

     metrics:
     - name: Wer
       type: wer
+      value: 0.065924219787
 library_name: transformers
 ---
 ## Training procedure
+- Epochs: 9
+- Batch size: 32 / 20
 - Optimizer: AdamW
 - LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
+- Mixed precision: bfloat16 (fp32 in the final epoch)
 - SpecAugment: frequency + time masking
 - Sampling: weighted sampling across datasets
 | Epoch | Train loss | CV WER |
 |------:|------------------:|-------:|
+| 1 | 0.333 | 0.1614 |
+| 2 | 0.344 | 0.1278 |
+| 3 | 0.251 | 0.1112 |
+| 4 | 0.202 | 0.1032 |
+| 5 | 0.167 | 0.0934 |
+| 6 | 0.138 | 0.0790 |
+| 7 | 0.118 | 0.0740 |
+| 8 | 0.103 | 0.0709 |
+| 9 | 0.096	| 0.0659 |
 ## Evaluation Metrics