---
license: apache-2.0
tags:
  - text-to-speech
  - onnx
  - voice-cloning
  - cpu-inference
  - qwen3-tts
pipeline_tag: text-to-speech
library_name: onnxruntime
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-Base
---

# 🎙️ Qwen3-TTS-12Hz-1.7B-Base (ONNX)

## 🚀 Overview
**Qwen3-TTS-12Hz-1.7B-Base-ONNX** is the optimization of the Qwen3-TTS framework. This model implements a discrete multi-codec Language Model (LM) architecture capable of **3-second rapid voice cloning** with enhanced prosody and vocal fidelity. 

The ONNX conversion enables low-latency, cross-platform deployment on both high-end CPUs and NVIDIA GPUs.

## 💎 Key Features
* **Zero-Shot Voice Cloning**: High-similarity cloning (>97%) using only 3 seconds of reference audio.
* **Ultra-Low Latency**: End-to-end streaming generation as low as **97ms**.
* **Decoupled Architecture**: Separate components for text processing, token generation, and speech synthesis.
* **Multilingual Excellence**: Native-level pronunciation for 10 major global languages.
* **Vocal Richness**: 2048-dimensional speaker embeddings for superior similarity.

## 🏗️ Model Architecture
A complex modular pipeline consisting of:
* **Talker (Transformer)**: 28 layers (Hidden Size: 2048, 8 KV Heads).
* **Code Predictor**: 5-layer Transformer for multi-codec resolution.
* **Vocoder**: BigVGAN-based high-fidelity speech decoder.
* **Speaker Encoder**: ECAPA-TDNN for embedding extraction.

## 📦 Model Components (Modular Specs)
| Component | File | Description | Output |
| :--- | :--- | :--- | :--- |
| **Talker Prefill** | `talker_prefill.onnx` | Initial text processing & KV Cache setup. | Logits & Hidden states. |
| **Talker Decode** | `talker_decode.onnx` | Iterative token generation logic. | New KV Cache. |
| **Code Predictor** | `code_predictor.onnx` | Multi-codec prediction (12Hz). | Multi-codebook codes. |
| **Vocoder** | `vocoder.onnx` | Final waveform synthesis. | 24kHz Audio. |
| **Speaker Enc.** | `speaker_encoder.onnx` | Reference audio analysis. | 2048-dim Embedding. |

## 🛠️ Installation
```bash
pip install onnxruntime-gpu librosa soundfile numpy torch transformers