--- license: apache-2.0 base_model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice pipeline_tag: text-to-speech tags: - text-to-speech - tts - qwen3-tts - tensorrt - tensorrt-edge-llm - onnx language: - en - zh - ja - ko - de - fr - ru - pt - es - it --- # Qwen3-TTS-12Hz-1.7B-CustomVoice — TensorRT-Edge-LLM ONNX (FP16) ONNX export of **[Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)** for use with **[NVIDIA TensorRT-Edge-LLM](https://github.com/NVIDIA/TensorRT-Edge-LLM) v0.7.1**. FP16. > ⚠️ **These files are NOT a plug-and-play model.** > They are the *intermediate ONNX* consumed by Edge-LLM's engine builder. The Talker/CodePredictor > graphs contain Edge-LLM's custom `AttentionPlugin` op and a runtime-bound `lm_head`, so they **will > not run in stock ONNX Runtime or generic TensorRT**. You must use TensorRT-Edge-LLM: build engines > on your device, then run its `qwen3_tts_inference`. See **Usage**. ## Contents ``` llm/ # Talker (TalkerCausalLM) — model.onnx + model.onnx.data (FP16, ~2.83 GB) + sidecars code_predictor/ # CodePredictor (residual RVQ codebooks) — model.onnx + .data + lm_heads / codec_embeddings / ... code2wav/ # Code2Wav vocoder — model.onnx + .data ``` Keep each `model.onnx` next to its `model.onnx.data` (external weights) and the sidecar `*.safetensors` / `tokenizer.json` in the same directory. ## Requirements - TensorRT-Edge-LLM **v0.7.1** (version matters — the ONNX op/loader conventions are version-specific). - An NVIDIA GPU with CUDA/TensorRT supported by Edge-LLM. ## Usage (build engines on your device, then run) ```bash # Build TensorRT-Edge-LLM v0.7.1, then point at its plugin: export EDGELLM_PLUGIN_PATH=$PWD/build/libNvInfer_edgellm_plugin.so # Build the 3 engines from this ONNX (per-GPU; ~5 min): ./build/examples/llm/llm_build --onnxDir llm --engineDir engines/talker --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1 ./build/examples/llm/llm_build --onnxDir code_predictor --engineDir engines/code_predictor --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1 ./build/examples/multimodal/audio_build --onnxDir code2wav --engineDir engines/code2wav # Run inference (input.json: speaker + messages; see Edge-LLM TTS docs): ./build/examples/omni/qwen3_tts_inference \ --talkerEngineDir engines/talker --code2wavEngineDir engines/code2wav/code2wav \ --tokenizerDir llm --inputFile input.json --outputAudioDir out # -> out/audio_req0.wav (24 kHz) ``` **Speakers:** `ryan, serena, aiden, vivian, dylan, eric, uncle_fu, ono_anna, sohee` ## Notes - These are regenerable from the base model: `python -m llm_loader.export_all_cli Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice ` (Edge-LLM export tools). - Some GPU architectures may require runtime/kernel adjustments in Edge-LLM for correct output — verify your generated audio (e.g. transcribe it) before relying on it. ## License & attribution Apache-2.0, inherited from the base model **Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice** (© Alibaba / Qwen). This repository redistributes an ONNX conversion of those weights. Please cite Qwen.