Instructions to use bahadirakdemir/Qwen3-TTS-12Hz-1.7B-CustomVoice-EdgeLLM-ONNX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use bahadirakdemir/Qwen3-TTS-12Hz-1.7B-CustomVoice-EdgeLLM-ONNX with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Qwen3-TTS-12Hz-1.7B-CustomVoice — TensorRT-Edge-LLM ONNX (FP16)
ONNX export of Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice for use with NVIDIA TensorRT-Edge-LLM v0.7.1. FP16.
⚠️ These files are NOT a plug-and-play model. They are the intermediate ONNX consumed by Edge-LLM's engine builder. The Talker/CodePredictor graphs contain Edge-LLM's custom
AttentionPluginop and a runtime-boundlm_head, so they will not run in stock ONNX Runtime or generic TensorRT. You must use TensorRT-Edge-LLM: build engines on your device, then run itsqwen3_tts_inference. See Usage.
Contents
llm/ # Talker (TalkerCausalLM) — model.onnx + model.onnx.data (FP16, ~2.83 GB) + sidecars
code_predictor/ # CodePredictor (residual RVQ codebooks) — model.onnx + .data + lm_heads / codec_embeddings / ...
code2wav/ # Code2Wav vocoder — model.onnx + .data
Keep each model.onnx next to its model.onnx.data (external weights) and the sidecar
*.safetensors / tokenizer.json in the same directory.
Requirements
- TensorRT-Edge-LLM v0.7.1 (version matters — the ONNX op/loader conventions are version-specific).
- An NVIDIA GPU with CUDA/TensorRT supported by Edge-LLM.
Usage (build engines on your device, then run)
# Build TensorRT-Edge-LLM v0.7.1, then point at its plugin:
export EDGELLM_PLUGIN_PATH=$PWD/build/libNvInfer_edgellm_plugin.so
# Build the 3 engines from this ONNX (per-GPU; ~5 min):
./build/examples/llm/llm_build --onnxDir llm --engineDir engines/talker --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/llm/llm_build --onnxDir code_predictor --engineDir engines/code_predictor --maxInputLen 4096 --maxKVCacheCapacity 4096 --maxBatchSize 1
./build/examples/multimodal/audio_build --onnxDir code2wav --engineDir engines/code2wav
# Run inference (input.json: speaker + messages; see Edge-LLM TTS docs):
./build/examples/omni/qwen3_tts_inference \
--talkerEngineDir engines/talker --code2wavEngineDir engines/code2wav/code2wav \
--tokenizerDir llm --inputFile input.json --outputAudioDir out
# -> out/audio_req0.wav (24 kHz)
Speakers: ryan, serena, aiden, vivian, dylan, eric, uncle_fu, ono_anna, sohee
Notes
- These are regenerable from the base model:
python -m llm_loader.export_all_cli Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice <out_dir>(Edge-LLM export tools). - Some GPU architectures may require runtime/kernel adjustments in Edge-LLM for correct output — verify your generated audio (e.g. transcribe it) before relying on it.
License & attribution
Apache-2.0, inherited from the base model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice (© Alibaba / Qwen). This repository redistributes an ONNX conversion of those weights. Please cite Qwen.
- Downloads last month
- 23
Model tree for bahadirakdemir/Qwen3-TTS-12Hz-1.7B-CustomVoice-EdgeLLM-ONNX
Base model
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice