--- license: cc-by-4.0 language: - en pipeline_tag: automatic-speech-recognition thumbnail: null tags: - automatic-speech-recognition - speech - audio - Transducer - TDT - FastConformer - Conformer - pytorch - NeMo - hf-asr-leaderboard - OpenVINO widget: - example_title: Librispeech sample 1 src: https://cdn-media.huggingface.co/speech_samples/sample1.flac - example_title: Librispeech sample 2 src: https://cdn-media.huggingface.co/speech_samples/sample2.flac base_model: - nvidia/parakeet-tdt-0.6b-v2 --- # Parakeet TDT 0.6B V2 - OpenVINO [![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe) [![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/eddy?style=flat&logo=github)](https://github.com/FluidInference/eddy) OpenVINO-optimized version of NVIDIA's Parakeet TDT 0.6B V2 model for high-performance automatic speech recognition on Intel NPUs and CPUs. ## Benchmark Results **Hardware**: Intel Core Ultra 7 155H (Meteor Lake) with Intel AI Boost NPU **Dataset**: LibriSpeech test-clean (2,620 files, 5.4 hours) **Software**: OpenVINO 2025.x | Metric | Value | |--------|-------| | **Average WER** | 2.87% | | **Median WER** | 0.00% | | **Average CER** | 1.07% | | **RTFx (NPU)** | 37.8× | | **RTFx (CPU)** | 5-8× | | **Total processing time** | 514.7s | ## Performance Comparison | Implementation | Device | RTFx | |----------------|--------|------| | **eddy (OpenVINO)** | Intel Core Ultra 7 155H NPU | **37.8×** | | Parakeet (PyTorch) | Intel Arc 140V GPU | 19.8× | | **eddy (OpenVINO)** | Intel Core Ultra 7 155H CPU | **5-8×** | > **Note**: Benchmarked on HP EliteBook Ultra G1i. eddy NPU is 1.9× faster than PyTorch on Intel Arc GPU, with lower power consumption. ## Usage Python usage via ctypes available - see [eddy repository](https://github.com/FluidInference/eddy) for details. ## Model Details - **Parameters**: 600M - **Architecture**: FastConformer-RNNT (4-model pipeline) - **Language**: English only - **Blank token ID**: 1024 - **Context window**: 10s chunks with 3s overlap - **Features**: LSTM state continuity, token deduplication, per-token timestamps ## License CC-BY-4.0 - See [LICENSE](LICENSE) for details. ## Links - **GitHub**: [FluidInference/eddy](https://github.com/FluidInference/eddy) - **Base Model**: [nvidia/parakeet-tdt-0.6b-v2](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) - **Documentation**: [Benchmark Results](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md) ## Acknowledgments Based on NVIDIA's Parakeet TDT model. OpenVINO conversion and optimization by the FluidInference team.