Instructions to use istomin9192/whisper-small-sr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use istomin9192/whisper-small-sr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="istomin9192/whisper-small-sr")# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("istomin9192/whisper-small-sr") model = AutoModelForMultimodalLM.from_pretrained("istomin9192/whisper-small-sr") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,7 +26,7 @@ model-index:
|
|
| 26 |
metrics:
|
| 27 |
- name: Wer
|
| 28 |
type: wer
|
| 29 |
-
value: 0.
|
| 30 |
library_name: transformers
|
| 31 |
---
|
| 32 |
|
|
@@ -53,11 +53,11 @@ This model was fine-tuned on a **mixture of publicly available Serbian speech co
|
|
| 53 |
|
| 54 |
## Training procedure
|
| 55 |
|
| 56 |
-
- Epochs:
|
| 57 |
-
- Batch size: 32
|
| 58 |
- Optimizer: AdamW
|
| 59 |
- LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
|
| 60 |
-
- Mixed precision: bfloat16
|
| 61 |
- SpecAugment: frequency + time masking
|
| 62 |
- Sampling: weighted sampling across datasets
|
| 63 |
|
|
@@ -65,13 +65,15 @@ This model was fine-tuned on a **mixture of publicly available Serbian speech co
|
|
| 65 |
|
| 66 |
| Epoch | Train loss | CV WER |
|
| 67 |
|------:|------------------:|-------:|
|
| 68 |
-
| 1 | 0.
|
| 69 |
-
| 2 | 0.
|
| 70 |
-
| 3 | 0.
|
| 71 |
-
| 4 | 0.
|
| 72 |
-
| 5 | 0.
|
| 73 |
-
| 6 | 0.
|
| 74 |
-
| 7 | 0.
|
|
|
|
|
|
|
| 75 |
|
| 76 |
## Evaluation Metrics
|
| 77 |
|
|
|
|
| 26 |
metrics:
|
| 27 |
- name: Wer
|
| 28 |
type: wer
|
| 29 |
+
value: 0.065924219787
|
| 30 |
library_name: transformers
|
| 31 |
---
|
| 32 |
|
|
|
|
| 53 |
|
| 54 |
## Training procedure
|
| 55 |
|
| 56 |
+
- Epochs: 9
|
| 57 |
+
- Batch size: 32 / 20
|
| 58 |
- Optimizer: AdamW
|
| 59 |
- LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
|
| 60 |
+
- Mixed precision: bfloat16 (fp32 in the final epoch)
|
| 61 |
- SpecAugment: frequency + time masking
|
| 62 |
- Sampling: weighted sampling across datasets
|
| 63 |
|
|
|
|
| 65 |
|
| 66 |
| Epoch | Train loss | CV WER |
|
| 67 |
|------:|------------------:|-------:|
|
| 68 |
+
| 1 | 0.333 | 0.1614 |
|
| 69 |
+
| 2 | 0.344 | 0.1278 |
|
| 70 |
+
| 3 | 0.251 | 0.1112 |
|
| 71 |
+
| 4 | 0.202 | 0.1032 |
|
| 72 |
+
| 5 | 0.167 | 0.0934 |
|
| 73 |
+
| 6 | 0.138 | 0.0790 |
|
| 74 |
+
| 7 | 0.118 | 0.0740 |
|
| 75 |
+
| 8 | 0.103 | 0.0709 |
|
| 76 |
+
| 9 | 0.096 | 0.0659 |
|
| 77 |
|
| 78 |
## Evaluation Metrics
|
| 79 |
|