Qwen3.6-27B Hybrid-Optimized Quantization for 16 GB of VRAM

At ggufbench.com, we are always looking to advance local AI on average consumer hardware. We would love it if you took a minute to submit your performance results and launch arguments for this model, and others, on our website!

Quick Specs

  • File Size: 12.576 GiB
  • Avg Bits/Weight: 4.01 bpw
  • Target VRAM: 16 GB GPUs

Architecture-Aware Quant Strategy

Qwen3.6-27B is a hybrid Mamba/Transformer model. Not all layers serve the same purpose, and not all tensors tolerate quantization equally. This layout respects the architecture by:

  1. Protecting pure-attention layers (blk.3,7,11...63) with higher precision for global reasoning and long-range focus.
  2. Compressing SSM-dominated hybrid layers aggressively where the recurrent state carries the sequential load.
  3. Preserving critical routing & projection tensors at native or near-native precision to prevent error compounding.
  4. Downgrading resilient tensors (embeddings, FFN gate/up) where KLD sensitivity is flat and quality loss is imperceptible.

Benchmark Summary (WikiText-2, 580 chunks)

Metric This sokann (4.256 bpw) bartowski Q3_K_M mradermacher i1.IQ4_XS bartowski IQ4_XS
Size (BPW) 4.01 4.256 4.270 4.483 4.556
Size (GiB) 12.576 13.327 13.370 14.036 14.266
Mean PPL(Q) 7.128552 ยฑ 0.046785 7.098696 ยฑ 0.047344 6.993009 ยฑ 0.046208 7.020660 ยฑ 0.046587 6.996323 ยฑ 0.046332
Mean PPL(base) 6.900925 ยฑ 0.045382 6.908506 ยฑ 0.045543 6.908506 ยฑ 0.045543 6.908506 ยฑ 0.045543 6.908506 ยฑ 0.045543
Cor(ln(PPL(Q)), ln(PPL(base))) 98.76% 99.19% 98.52% 99.30% 99.32%
Mean KLD 0.049767 ยฑ 0.000745 0.033452 ยฑ 0.000723 0.058818 ยฑ 0.000881 0.046348 ยฑ 0.000841 0.026270 ยฑ 0.000653
Maximum KLD 22.599598 23.255085 24.616274 24.175169 22.992002
99.9% KLD 3.240748 2.907350 3.986622 3.614290 2.385293
RMS ฮ”p 6.167 ยฑ 0.054 % 4.936 ยฑ 0.054 % 6.690 ยฑ 0.059 % 5.867 ยฑ 0.060 % 4.352 ยฑ 0.057 %
Same top p 91.146 ยฑ 0.074 % 92.427 ยฑ 0.069 % 90.350 ยฑ 0.077 % 93.903 ยฑ 0.062 % 93.888 ยฑ 0.062 %
  • Efficiency-First Parity: Achieves competitive quality at ~6% smaller size โ€” PPL(Q) within 0.4% of sokann (4.256 bpw) and on par with bartowski Q3_K_M on KLD, all while saving ~750 MB of VRAM for larger context or higher batch sizes.

Acknowledgments

  • Special thanks to unsloth for their 9 TB of Qwen3.5 GGUF Benchmarks, which were instrumental in selecting the optimal quantization strategy for this model.
  • Thanks to bartowski for providing the calibration data used in this process.
Downloads last month
9,322
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ggufbench/Qwen3.6-27B-4bpw-16GB-VRAM

Base model

Qwen/Qwen3.6-27B
Quantized
(416)
this model