| --- |
| language: it |
| license: apache-2.0 |
| library_name: onnxruntime |
| pipeline_tag: voice-activity-detection |
| tags: |
| - turn-detection |
| - end-of-utterance |
| - distilbert |
| - onnx |
| - quantized |
| - conversational-ai |
| - voice-assistant |
| - real-time |
| base_model: distilbert-base-multilingual-cased |
| datasets: |
| - videosdk-live/Namo-Turn-Detector-v1-Train |
| model-index: |
| - name: Namo Turn Detector v1 - Italian |
| results: |
| - task: |
| type: text-classification |
| name: Turn Detection |
| dataset: |
| name: Namo Turn Detector v1 Test - Italian |
| type: videosdk-live/Namo-Turn-Detector-v1-Test |
| split: train |
| metrics: |
| - type: accuracy |
| value: 0.868286 |
| name: Accuracy |
| - type: f1 |
| value: 0.880925 |
| name: F1 Score |
| - type: precision |
| value: 0.80042 |
| name: Precision |
| - type: recall |
| value: 0.979434 |
| name: Recall |
| --- |
| |
| # 🎯 Namo Turn Detector v1 - Italian |
|
|
| <div align="center"> |
|
|
| [](https://opensource.org/licenses/Apache-2.0) |
| [](https://onnx.ai/) |
| [](https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Italian) |
| []() |
|
|
| **🚀 Namo Turn Detection Model for Italian** |
|
|
| </div> |
|
|
| --- |
|
|
| ## 📋 Overview |
|
|
| The **Namo Turn Detector** is a specialized AI model designed to solve one of the most challenging problems in conversational AI: **knowing when a user has finished speaking**. |
|
|
| This Italian-specialist model uses advanced natural language understanding to distinguish between: |
| - ✅ **Complete utterances** (user is done speaking) |
| - 🔄 **Incomplete utterances** (user will continue speaking) |
|
|
| Built on DistilBERT architecture and optimized with quantized ONNX format, it delivers enterprise-grade performance with minimal latency. |
|
|
| ## 🔑 Key Features |
|
|
| - **Turn Detection Specialist**: Detects end-of-turn vs. continuation in Italian speech transcripts. |
| - **Low Latency**: Optimized with **quantized ONNX** for <13ms inference. |
| - **Robust Performance**: 86.8% accuracy on diverse Italian utterances. |
| - **Easy Integration**: Compatible with Python, ONNX Runtime, and VideoSDK Agents SDK. |
| - **Enterprise Ready**: Supports real-time conversational AI and voice assistants. |
|
|
| ## 📊 Performance Metrics |
| <div> |
|
|
| | Metric | Score | |
| |--------|-------| |
| | **🎯 Accuracy** | **86.82%** | |
| | **📈 F1-Score** | **88.09%** | |
| | **🎪 Precision** | **80.04%** | |
| | **🎭 Recall** | **97.94%** | |
| | **⚡ Latency** | **<13ms** | |
| | **💾 Model Size** | **~135MB** | |
|
|
| </div> |
| <img src="./confusion_matrices.png" alt="Alt text" width="600" height="400"/> |
|
|
| > 📊 *Evaluated on 700+ Italian utterances from diverse conversational contexts* |
|
|
| ## ⚡️ Speed Analysis |
|
|
| <img src="./performance_analysis.png" alt="Alt text" width="600" height="400"/> |
|
|
| ## 🔧 Train & Test Scripts |
|
|
| <div align="center"> |
|
|
| [](https://colab.research.google.com/drive/1DqSUYfcya0r2iAEZB9fS4mfrennubduV) [](https://colab.research.google.com/drive/19ZOlNoHS2WLX2V4r5r492tsCUnYLXnQR) |
|
|
| </div> |
|
|
| ## 🛠️ Installation |
|
|
| To use this model, you will need to install the following libraries. |
|
|
| ```bash |
| pip install onnxruntime transformers huggingface_hub |
| ``` |
|
|
| ## 🚀 Quick Start |
|
|
| You can run inference directly from Hugging Face repository. |
|
|
| ```python |
| import numpy as np |
| import onnxruntime as ort |
| from transformers import AutoTokenizer |
| from huggingface_hub import hf_hub_download |
| |
| class TurnDetector: |
| def __init__(self, repo_id="videosdk-live/Namo-Turn-Detector-v1-Italian"): |
| """ |
| Initializes the detector by downloading the model and tokenizer |
| from the Hugging Face Hub. |
| """ |
| print(f"Loading model from repo: {repo_id}") |
| |
| # Download the model and tokenizer from the Hub |
| # Authentication is handled automatically if you are logged in |
| model_path = hf_hub_download(repo_id=repo_id, filename="model_quant.onnx") |
| self.tokenizer = AutoTokenizer.from_pretrained(repo_id) |
| |
| # Set up the ONNX Runtime inference session |
| self.session = ort.InferenceSession(model_path) |
| self.max_length = 512 |
| print("✅ Model and tokenizer loaded successfully.") |
| |
| def predict(self, text: str) -> tuple: |
| """ |
| Predicts if a given text utterance is the end of a turn. |
| Returns (predicted_label, confidence) where: |
| - predicted_label: 0 for "Not End of Turn", 1 for "End of Turn" |
| - confidence: confidence score between 0 and 1 |
| """ |
| # Tokenize the input text |
| inputs = self.tokenizer( |
| text, |
| truncation=True, |
| max_length=self.max_length, |
| return_tensors="np" |
| ) |
| |
| # Prepare the feed dictionary for the ONNX model |
| feed_dict = { |
| "input_ids": inputs["input_ids"], |
| "attention_mask": inputs["attention_mask"] |
| } |
| |
| # Run inference |
| outputs = self.session.run(None, feed_dict) |
| logits = outputs[0] |
| |
| probabilities = self._softmax(logits[0]) |
| predicted_label = np.argmax(probabilities) |
| confidence = float(np.max(probabilities)) |
| |
| return predicted_label, confidence |
| |
| def _softmax(self, x, axis=None): |
| if axis is None: |
| axis = -1 |
| exp_x = np.exp(x - np.max(x, axis=axis, keepdims=True)) |
| return exp_x / np.sum(exp_x, axis=axis, keepdims=True) |
| |
| # --- Example Usage --- |
| if __name__ == "__main__": |
| detector = TurnDetector() |
| |
| sentences = [ |
| "È stato spesso visto tirar fuori i favi dai grandi nidi di vespe e di api, così come dai nidi più piccoli dei calabroni.", # Expected: End of Turn |
| "L'opera che ne falsi del tutto lo spirito è?", # Expected: Not End of Turn |
| ] |
| |
| for sentence in sentences: |
| predicted_label, confidence = detector.predict(sentence) |
| result = "End of Turn" if predicted_label == 1 else "Not End of Turn" |
| print(f"'{sentence}' -> {result} (confidence: {confidence:.3f})") |
| print("-" * 50) |
| |
| ``` |
|
|
|
|
| ## 🤖 VideoSDK Agents Integration |
|
|
| Integrate this turn detector directly with VideoSDK Agents for production-ready conversational AI applications. |
|
|
| ```python |
| from videosdk_agents import NamoTurnDetectorV1, pre_download_namo_turn_v1_model |
| |
| #download model |
| pre_download_namo_turn_v1_model(language="it") |
| |
| # Initialize Italian turn detector for VideoSDK Agents |
| turn_detector = NamoTurnDetectorV1(language="it") |
| ``` |
|
|
| > 📚 [**Complete Integration Guide**](https://docs.videosdk.live/ai_agents/plugins/namo-turn-detector) - Learn how to use `NamoTurnDetectorV1` with VideoSDK Agents |
|
|
| ## 📖 Citation |
|
|
| ```bibtex |
| @model{namo_turn_detector_it_2025, |
| title={Namo Turn Detector v1: Italian}, |
| author={VideoSDK Team}, |
| year={2025}, |
| publisher={Hugging Face}, |
| url={https://huggingface.co/videosdk-live/Namo-Turn-Detector-v1-Italian}, |
| note={ONNX-optimized DistilBERT for turn detection in Italian} |
| } |
| ``` |
|
|
| ## 📄 License |
|
|
| This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. |
|
|
| <div align="center"> |
|
|
| **Made with ❤️ by the VideoSDK Team** |
|
|
| [](https://videosdk.live) |
|
|
| </div> |