inferencerlabs
/

Qwen3.6-35B-A3B-MTP-MLX-4.5bit

Image-Text-to-Text

speculative-decoding

Model card Files Files and versions

inferencerlabs commited on 14 days ago

Commit

4e96025

·

verified ·

1 Parent(s): 3f971b8

Upload model file

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -8,13 +8,13 @@ tags:
 - speculative-decoding
 - draft-model
 base_model:
-- Qwen/Qwen3.6-27B
 pipeline_tag: image-text-to-text
 ---
-# Qwen3.6-27B MTP
-See Qwen3.6-27B with MTP in action: [demonstration videos](https://youtube.com/xcreate)
-This draft model contains the extracted **Multi-Token Prediction (MTP)** layers from **[Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B)** for use alongside the [Qwen3.6-27B-MLX](https://huggingface.co/models?search=inferencerlabs/qwen3.6-27b-mlx) model as a speculative decoder for improved performance.
 #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.11.5](https://inferencer.com)
 <table style="border-collapse: collapse; border: none; text-align:left; margin-top:10px; margin-bottom:0px;">

 - speculative-decoding
 - draft-model
 base_model:
+- Qwen/Qwen3.6-35B-A3B
 pipeline_tag: image-text-to-text
 ---
+# Qwen3.6-35B-A3B MTP
+See Qwen3.6-35B-A3B with MTP in action: [demonstration videos](https://youtube.com/xcreate)
+This draft model contains the extracted **Multi-Token Prediction (MTP)** layers from **[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)** for use alongside the [Qwen3.6-35B-A3B-MLX](https://huggingface.co/models?search=inferencerlabs/qwen3.6-35B-A3B-mlx) model as a speculative decoder for improved performance.
 #### Tested on a M3 Ultra 512GB RAM using [Inferencer app v1.11.5](https://inferencer.com)
 <table style="border-collapse: collapse; border: none; text-align:left; margin-top:10px; margin-bottom:0px;">