--- language: bn tags: - pyannote - speaker-diarization - segmentation - bengali - displace license: mit datasets: - displace2024 - displace2026 base_model: pyannote/segmentation-3.0 --- # Bengali Speaker Segmentation Model (Robust) Fine-tuned pyannote/segmentation-3.0 model for Bengali speaker diarization. ## Training Data - DISPLACE 2024 (35 files, ~20h) - DISPLACE 2026 (78 files, ~15h) - Total: 113 files, ~35 hours ## Training Configuration - Base model: pyannote/segmentation-3.0 - Epochs: 26 (early stopping from 30) - Learning rate: OneCycleLR, max_lr=5e-5 - Label smoothing: 0.1 - Batch size: 32 - Gradient clipping: max_norm=1.0 - Samples per file: 50 (training), 20 (validation) ## Results - **Best validation loss**: 1.2473 - **Best validation accuracy**: 55.46% - **Final train accuracy**: 60.83% ## Usage ```python from pyannote.audio import Model # Load model model = Model.from_pretrained("smam/pyannote-segmentation-bengali-displace") # Or load weights manually import torch state_dict = torch.load("pytorch_model.bin") model.load_state_dict(state_dict) ``` ## Architecture - SincNet frontend - Bidirectional LSTM - Powerset classification (7 classes for 3 speakers, max 2 simultaneous)