---
license: mit
tags:
- speaker-diarization
- speaker-embedding
- pyannote
- indic-languages
- bengali
language:
- bn
- hi
- ta
- te
- mr
- gu
- kn
- ml
- or
- pa
datasets:
- ai4bharat/IndicVoices
- ai4bharat/Kathbath
base_model: pyannote/wespeaker-voxceleb-resnet34-LM
---

# Indic Speaker Embedding Model (Fine-tuned)

Fine-tuned speaker embedding model for Indian languages, based on pyannote wespeaker-voxceleb-resnet34-LM.

## Model Description

This model was fine-tuned on 112K+ audio samples from:
- **IndicVoices**: 22 Indian languages, massive speaker diversity
- **Kathbath**: 12 Indian languages

## Training Details

- **Base Model**: pyannote/wespeaker-voxceleb-resnet34-LM
- **Embedding Dimension**: 256
- **Training Samples**: 84,741
- **Validation Samples**: 17,161
- **Held-out for EER**: 10,317
- **Total Speakers**: 3,975 (training) + 442 (held-out)

### Training Configuration
- Phase 1: 5 epochs with frozen backbone (head only)
- Phase 2: 15 epochs full fine-tuning
- Augmentations: 13 types (noise, reverb, pitch shift, etc.)
- Label smoothing: 0.1
- Dropout: 0.3

## Results

| Metric | Value |
|--------|-------|
| Best Val Accuracy | 91.4% |
| Best EER | 4.18% |

## Usage

```python
import torch
from pyannote.audio import Model

# Load base model
model = Model.from_pretrained("pyannote/wespeaker-voxceleb-resnet34-LM")

# Load fine-tuned weights
checkpoint = torch.load("checkpoint.pt")
# Note: This checkpoint includes a classification head for Indian languages
```

## Intended Use

- Speaker diarization for Indian language audio
- Speaker verification/identification
- Bengali speaker diarization (DLSPRINT challenge)