bahadirakdemir's picture
Qwen3-TTS-1.7B-CustomVoice TensorRT-Edge-LLM FP16 ONNX (paths/metadata scrubbed)
59ef4d7 verified
|
Raw
History Blame Contribute Delete
3.23 kB
metadata
license: apache-2.0
base_model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
pipeline_tag: text-to-speech
tags:
  - text-to-speech
  - tts
  - qwen3-tts
  - tensorrt
  - tensorrt-edge-llm
  - onnx
language:
  - en
  - zh
  - ja
  - ko
  - de
  - fr
  - ru
  - pt
  - es
  - it

Qwen3-TTS-12Hz-1.7B-CustomVoice — TensorRT-Edge-LLM ONNX (FP16)

ONNX export of Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice for use with NVIDIA TensorRT-Edge-LLM v0.7.1. FP16.

⚠️ These files are NOT a plug-and-play model. They are the intermediate ONNX consumed by Edge-LLM's engine builder. The Talker/CodePredictor graphs contain Edge-LLM's custom AttentionPlugin op and a runtime-bound lm_head, so they will not run in stock ONNX Runtime or generic TensorRT. You must use TensorRT-Edge-LLM: build engines on your device, then run its qwen3_tts_inference. See Usage.

Contents

llm/             # Talker (TalkerCausalLM) — model.onnx + model.onnx.data (FP16, ~2.83 GB) + sidecars
code_predictor/  # CodePredictor (residual RVQ codebooks) — model.onnx + .data + lm_heads / codec_embeddings / ...
code2wav/        # Code2Wav vocoder — model.onnx + .data

Keep each model.onnx next to its model.onnx.data (external weights) and the sidecar *.safetensors / tokenizer.json in the same directory.

Requirements

  • TensorRT-Edge-LLM v0.7.1 (version matters — the ONNX op/loader conventions are version-specific).
  • An NVIDIA GPU with CUDA/TensorRT supported by Edge-LLM.

Usage (build engines on your device, then run)

# Build TensorRT-Edge-LLM v0.7.1, then point at its plugin:
export EDGELLM_PLUGIN_PATH=$PWD/build/libNvInfer_edgellm_plugin.so

# Build the 3 engines from this ONNX (per-GPU; ~5 min):
./build/examples/llm/llm_build          --onnxDir llm            --engineDir engines/talker         --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/llm/llm_build          --onnxDir code_predictor --engineDir engines/code_predictor --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/multimodal/audio_build --onnxDir code2wav       --engineDir engines/code2wav

# Run inference (input.json: speaker + messages; see Edge-LLM TTS docs):
./build/examples/omni/qwen3_tts_inference \
  --talkerEngineDir engines/talker --code2wavEngineDir engines/code2wav/code2wav \
  --tokenizerDir llm --inputFile input.json --outputAudioDir out
# -> out/audio_req0.wav (24 kHz)

Speakers: ryan, serena, aiden, vivian, dylan, eric, uncle_fu, ono_anna, sohee

Notes

  • These are regenerable from the base model: python -m llm_loader.export_all_cli Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice <out_dir> (Edge-LLM export tools).
  • Some GPU architectures may require runtime/kernel adjustments in Edge-LLM for correct output — verify your generated audio (e.g. transcribe it) before relying on it.

License & attribution

Apache-2.0, inherited from the base model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (© Alibaba / Qwen). This repository redistributes an ONNX conversion of those weights. Please cite Qwen.