--- language: en library_name: mlx tags: - quantized - mlx - mtp - speculative-decoding - draft-model base_model: - Qwen/Qwen3.6-27B pipeline_tag: image-text-to-text --- # Qwen3.6-27B MTP See Qwen3.6-27B with MTP in action: [demonstration videos](https://youtube.com/xcreate) This draft model contains the extracted **Multi-Token Prediction (MTP)** layers from **[Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B)** for use alongside the [Qwen3.6-27B-MLX](https://huggingface.co/models?search=inferencerlabs/qwen3.6-27b-mlx) model as a speculative decoder for improved performance. #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.11.5](https://inferencer.com)
| Without decoder | ~17.1 tokens/s ~28.36 GiB (debug build) |
| With decoder | ~30.08 tokens/s ~28.99 GiB (debug build) |