Diarization

by cospaia - opened Apr 6, 2025

Apr 6, 2025

Hello,

Thanks for providing! Are there any plans to support diarzation (speaker change recognition)?

I tried the latest GGML checkpoint with Whisper.cpp and it has a flag -tdrz which should diarize, but I think the model needs to support it.

fredde1

Apr 13, 2025

would also be very interested in this!! (Y)

Lauler

National Library of Sweden / KBLab org Apr 22, 2025

•

edited Apr 22, 2025

We will not support diarization as part of the Whisper model's inherent functionality. As far as I can see the support in Whisper.cpp is quite experimental and limited to only the English version of whisper.small.

If you want to run our models with diarization I recommend a multi step pipeline. WhisperX's README has an example how to achieve this with the WhisperX library. Load our Whisper and Wav2vec2 as the first 2 steps (transcription and alignment), and then run the diarization step as described in WhisperX.

Lauler changed discussion status to closed Apr 22, 2025

tophee

Oct 29, 2025

I've been working on a pipeline that combines kb-whisper-large with diarization using pyannote. Feel free to try it out: https://github.com/papatistos/swhisper

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment