--- base_model: Qwen/Qwen3-TTS-12Hz-0.6B-Base library_name: mlx tags: - mlx - tts - mxfp4 - apple-silicon - qwen3-tts - microscaling --- # Qwen3-TTS-12Hz-0.6B-Base — MXFP4 (MLX) MXFP4 quantized version of [Qwen/Qwen3-TTS-12Hz-0.6B-Base](https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base) for Apple Silicon. Converted using [mlx-audio](https://github.com/Blaizzy/mlx-audio) with native MXFP4 (Microscaling Float 4-bit, OCP MX Spec). ## Benchmark (M2 Ultra 128GB) | Quant | Size | Avg Time (3 runs) | |---|---|---| | 8bit | 1.9 GB | 8.50s | | **mxfp4** | **1.6 GB** | **7.77s (~8.6% faster)** | Audio quality verified: voice cloning works, long German texts direct speech render cleanly. ## Conversion ```bash python -m mlx_audio.convert \ --hf-path Qwen/Qwen3-TTS-12Hz-0.6B-Base \ --mlx-path ./Qwen3-TTS-0.6B-Base-mxfp4 \ --quantize \ --q-mode mxfp4 ``` ## Usage ```python from mlx_audio.tts.utils import load_model from mlx_audio.tts.generate import generate_audio model = load_model("mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4") generate_audio( model=model, text="Hello, this is a test.", ref_audio="reference.wav", temperature=0.3, repetition_penalty=1.1, ) ``` ## CLI ```bash python -m mlx_audio.tts.generate \ --model mpe74/Qwen3-TTS-12Hz-0.6B-Base-mxfp4 \ --text "Dies ist ein Test." \ --ref_audio reference.wav \ --ref_text "Transkript der Reference Audio" \ --temperature 0.3 \ --repetition_penalty 1.1 \ --play ```