sentis-silero-vad

Silero voice activity detection converted to Unity Inference Engine (Sentis) FP16.

Files

silero_vad_fp16.sentis
SileroVad.cs            # self-contained Unity Sentis inference

Inference

The model takes one 512-sample frame (16 kHz mono) plus the recurrent LSTM state h/c, and returns a speech probability plus the updated state. Carry h/c across frames; clear them to reset.

A complete self-contained implementation lives in SileroVad.cs (frame buffering, LSTM state carry, start/end thresholding with a silence timeout, SpeechStarted/SpeechEnded events). Minimal usage:

var vad = new SileroVad(BackendType.CPU);
vad.Load(modelRoot);                 // folder holding silero_vad_fp16.sentis
vad.SpeechStarted += () => { /* ... */ };
vad.StartListening();
// each frame: push 16 kHz mono samples, then drain
vad.PushSamples(micSamples, micSampleCount);
vad.Pump();                          // call regularly (e.g. once per frame)

License & attribution

MIT. Converted from snakers4/silero-vad (MIT).

Downloads last month: 8

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support