yuriyvnv
/

parakeet-tdt-0.6b-dutch

@@ -12,6 +12,9 @@ tags:
   - tdt
   - dutch
   - nvidia
 datasets:
   - fixie-ai/common_voice_17_0
   - yuriyvnv/synthetic_transcript_nl
@@ -24,7 +27,7 @@ model-index:
           type: automatic-speech-recognition
           name: Speech Recognition
         dataset:
-          name: Common Voice 17.0 (nl)
           type: fixie-ai/common_voice_17_0
           config: nl
           split: validation
@@ -32,6 +35,24 @@ model-index:
           - type: wer
             value: 3.73
             name: Val WER
 ---
 # Parakeet-TDT-0.6B Dutch
@@ -45,11 +66,19 @@ A Dutch automatic speech recognition (ASR) model fine-tuned from [nvidia/parakee
 | Base model | [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) |
 | Architecture | FastConformer-TDT (600M params) |
 | Language | Dutch (nl) |
-| Val WER | **3.73%** |
 | Input | 16 kHz mono audio |
 | Output | Dutch text with punctuation and capitalization |
 | License | CC-BY-4.0 |
 ## Training
 Fine-tuned on a combination of:
@@ -66,6 +95,7 @@ Fine-tuned on a combination of:
 | Warmup | 10% of total steps |
 | Batch size | 64 |
 | Precision | bf16-mixed |
 | Early stopping | 10 epochs patience on val WER |
 | Best epoch | 21 |
@@ -113,15 +143,15 @@ asr_model.change_attention_model(
 output = asr_model.transcribe(["long_audio.wav"])
 ```
-## Citation
-If you use this model, please cite the base Parakeet model:
-```bibtex
-@misc{parakeet-tdt-0.6b-v3,
-  title={Parakeet TDT 0.6B v3},
-  author={NVIDIA},
-  year={2025},
-  url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
-}
-```

   - tdt
   - dutch
   - nvidia
+  - common-voice
+  - synthetic-speech
+  - fine-tuned
 datasets:
   - fixie-ai/common_voice_17_0
   - yuriyvnv/synthetic_transcript_nl
           type: automatic-speech-recognition
           name: Speech Recognition
         dataset:
+          name: Common Voice 17.0 (nl) - Validation
           type: fixie-ai/common_voice_17_0
           config: nl
           split: validation
           - type: wer
             value: 3.73
             name: Val WER
+          - type: cer
+            value: 1.02
+            name: Val CER
+      - task:
+          type: automatic-speech-recognition
+          name: Speech Recognition
+        dataset:
+          name: Common Voice 17.0 (nl) - Test
+          type: fixie-ai/common_voice_17_0
+          config: nl
+          split: test
+        metrics:
+          - type: wer
+            value: 5.33
+            name: Test WER
+          - type: cer
+            value: 1.46
+            name: Test CER
 ---
 # Parakeet-TDT-0.6B Dutch
 | Base model | [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) |
 | Architecture | FastConformer-TDT (600M params) |
 | Language | Dutch (nl) |
 | Input | 16 kHz mono audio |
 | Output | Dutch text with punctuation and capitalization |
 | License | CC-BY-4.0 |
+## Evaluation Results
+Evaluated on [Common Voice 17.0](https://huggingface.co/datasets/fixie-ai/common_voice_17_0) Dutch splits (raw text, no normalization):
+| Split | WER | CER | Samples |
+|---|---|---|---|
+| Validation | **3.73%** | 1.02% | 9,062 |
+| Test | **5.33%** | 1.46% | 11,266 |
 ## Training
 Fine-tuned on a combination of:
 | Warmup | 10% of total steps |
 | Batch size | 64 |
 | Precision | bf16-mixed |
+| Gradient clipping | 1.0 |
 | Early stopping | 10 epochs patience on val WER |
 | Best epoch | 21 |
 output = asr_model.transcribe(["long_audio.wav"])
 ```
+## Intended Use
+This model is designed for transcribing Dutch speech to text. It works best on:
+- Read speech and conversational Dutch
+- Audio recorded at 16 kHz or higher
+- Segments up to 24 minutes (or longer with local attention enabled)
+## Limitations
+- Trained primarily on European Portuguese-accented Dutch from Common Voice; performance may vary on regional dialects or heavily accented speech
+- Synthetic training data was generated with OpenAI TTS voices, which may not fully represent natural speech variability
+- Not suitable for real-time streaming without additional configuration