Qwen3-TTS-12Hz-1.7B-CustomVoice — TensorRT-Edge-LLM ONNX (FP16)

ONNX export of Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice for use with NVIDIA TensorRT-Edge-LLM v0.7.1. FP16.

⚠️ These files are NOT a plug-and-play model. They are the intermediate ONNX consumed by Edge-LLM's engine builder. The Talker/CodePredictor graphs contain Edge-LLM's custom AttentionPlugin op and a runtime-bound lm_head, so they will not run in stock ONNX Runtime or generic TensorRT. You must use TensorRT-Edge-LLM: build engines on your device, then run its qwen3_tts_inference. See Usage.

Contents

llm/             # Talker (TalkerCausalLM) — model.onnx + model.onnx.data (FP16, ~2.83 GB) + sidecars
code_predictor/  # CodePredictor (residual RVQ codebooks) — model.onnx + .data + lm_heads / codec_embeddings / ...
code2wav/        # Code2Wav vocoder — model.onnx + .data

Keep each model.onnx next to its model.onnx.data (external weights) and the sidecar *.safetensors / tokenizer.json in the same directory.

Requirements

  • TensorRT-Edge-LLM v0.7.1 (version matters — the ONNX op/loader conventions are version-specific).
  • An NVIDIA GPU with CUDA/TensorRT supported by Edge-LLM.

Usage (build engines on your device, then run)

# Build TensorRT-Edge-LLM v0.7.1, then point at its plugin:
export EDGELLM_PLUGIN_PATH=$PWD/build/libNvInfer_edgellm_plugin.so

# Build the 3 engines from this ONNX (per-GPU; ~5 min):
./build/examples/llm/llm_build          --onnxDir llm            --engineDir engines/talker         --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/llm/llm_build          --onnxDir code_predictor --engineDir engines/code_predictor --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/multimodal/audio_build --onnxDir code2wav       --engineDir engines/code2wav

# Run inference (input.json: speaker + messages; see Edge-LLM TTS docs):
./build/examples/omni/qwen3_tts_inference \
  --talkerEngineDir engines/talker --code2wavEngineDir engines/code2wav/code2wav \
  --tokenizerDir llm --inputFile input.json --outputAudioDir out
# -> out/audio_req0.wav (24 kHz)

Speakers: ryan, serena, aiden, vivian, dylan, eric, uncle_fu, ono_anna, sohee

Notes

  • These are regenerable from the base model: python -m llm_loader.export_all_cli Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice <out_dir> (Edge-LLM export tools).
  • Some GPU architectures may require runtime/kernel adjustments in Edge-LLM for correct output — verify your generated audio (e.g. transcribe it) before relying on it.

License & attribution

Apache-2.0, inherited from the base model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (© Alibaba / Qwen). This repository redistributes an ONNX conversion of those weights. Please cite Qwen.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bahadirakdemir/Qwen3-TTS-12Hz-1.7B-CustomVoice-EdgeLLM-ONNX

Quantized
(9)
this model