--- language: en library_name: mlx tags: - quantized - mlx - mtp - speculative-decoding - draft-model base_model: - Qwen/Qwen3.6-35B-A3B pipeline_tag: image-text-to-text --- # Qwen3.6-35B-A3B MTP See Qwen3.6-35B-A3B with MTP in action: [demonstration videos](https://youtube.com/xcreate) This draft model contains the extracted **Multi-Token Prediction (MTP)** layers from **[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)** for use alongside the [Qwen3.6-35B-A3B-MLX](https://huggingface.co/models?search=inferencerlabs/qwen3.6-35B-A3B-mlx) model as a speculative decoder for improved performance. Q4.5-bit quant typically achieves high throughput at no loss in quality with less RAM usage in our coding test. #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.11.5](https://inferencer.com)
| Without decoder | ~46.4 tokens/s ~36.4 GiB (debug build) |
| With decoder | ~71.5 tokens/s ~37.0 GiB (debug build) |