--- license: mit tags: - speaker-diarization - speaker-embedding - pyannote - indic-languages - bengali language: - bn - hi - ta - te - mr - gu - kn - ml - or - pa datasets: - ai4bharat/IndicVoices - ai4bharat/Kathbath base_model: pyannote/wespeaker-voxceleb-resnet34-LM --- # Indic Speaker Embedding Model (Fine-tuned) Fine-tuned speaker embedding model for Indian languages, based on pyannote wespeaker-voxceleb-resnet34-LM. ## Model Description This model was fine-tuned on 112K+ audio samples from: - **IndicVoices**: 22 Indian languages, massive speaker diversity - **Kathbath**: 12 Indian languages ## Training Details - **Base Model**: pyannote/wespeaker-voxceleb-resnet34-LM - **Embedding Dimension**: 256 - **Training Samples**: 84,741 - **Validation Samples**: 17,161 - **Held-out for EER**: 10,317 - **Total Speakers**: 3,975 (training) + 442 (held-out) ### Training Configuration - Phase 1: 5 epochs with frozen backbone (head only) - Phase 2: 15 epochs full fine-tuning - Augmentations: 13 types (noise, reverb, pitch shift, etc.) - Label smoothing: 0.1 - Dropout: 0.3 ## Results | Metric | Value | |--------|-------| | Best Val Accuracy | 91.4% | | Best EER | 4.18% | ## Usage ```python import torch from pyannote.audio import Model # Load base model model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM") # Load fine-tuned weights checkpoint = torch.load("checkpoint.pt") # Note: This checkpoint includes a classification head for Indian languages ``` ## Intended Use - Speaker diarization for Indian language audio - Speaker verification/identification - Bengali speaker diarization (DLSPRINT challenge)