--- language: - de license: apache-2.0 tags: - automatic-speech-recognition - medical - german - asr - nemo - parakeet - mlx base_model: nvidia/parakeet-tdt-0.6b-v3 datasets: - Mediform/medical_asr_de --- # Parakeet Medical DE German medical ASR model fine-tuned from [nvidia/parakeet-tdt-0.6b-v3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) on [Mediform/medical_asr_de](https://huggingface.co/datasets/Mediform/medical_asr_de). This is the **best validation checkpoint (epoch 14, val_wer=11.55%)**. For the last epoch, see [Mediform/parakeet-medical-de-e20](https://huggingface.co/Mediform/parakeet-medical-de-e20). ## Performance | Model | Test WER | Val WER | |-------|----------|---------| | Baseline (parakeet-tdt-0.6b-v3) | 26.17% | - | | **This model (epoch 14, best val)** | **11.31%** | **11.55%** | | Last epoch (epoch 20) | 11.24% | 11.59% | **14.86% absolute WER improvement** over the English-only baseline on German medical speech. ## Training Details - **Base model:** nvidia/parakeet-tdt-0.6b-v3 (627M params, EncDecRNNTBPEModel) - **Dataset:** Mediform/medical_asr_de (14,388 train / 799 val / 799 test samples, 117h) - **Hardware:** 4x NVIDIA A40 (48GB) - **Strategy:** DDP, bf16-mixed precision - **Batch size:** 2/GPU x 4 GPUs x 8 accumulation = 64 effective - **Optimizer:** AdamW (lr=5e-5, cosine annealing, 200 warmup steps) - **Epochs:** 20 (best at epoch 14) - **Spec augment:** freq_masks=2, time_masks=10 ## Training Curve | Epoch | Val WER | |-------|---------| | 0 | 16.56% | | 1 | 13.21% | | 2 | 13.01% | | 3 | 12.80% | | 4 | 11.95% | | 8 | 11.82% | | **14** | **11.55%** | | 19 | 11.59% | ## Usage with MLX (Apple Silicon) Weights are in SafeTensors format, compatible with [mlx-audio-swift](https://github.com/AIDevelopers/mlx-audio-swift) for on-device inference. ## Medical Domain Coverage The training dataset covers 529 unique German medical terms including drug names, conditions, procedures, symptoms, and anatomy. Sources: VoxPopuli (EU Parliament medical debates), MultiMed (patient-doctor dialogues), CommonVoice, Spoken Wikipedia, M-AILABS, TuDa, Voxforge.