---
base_model: Qwen/Qwen3.6-27B
base_model_relation: quantized
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/LICENSE
---
# Qwen3.6-27B Hybrid-Optimized Quantization for 16 GB of VRAM

At [ggufbench.com](https://ggufbench.com), we are always looking to advance local AI on average consumer hardware. We would love it if you took a minute to submit your performance results and launch arguments for this model, and others, on our website!

## Quick Specs

- **File Size:** 12.576 GiB 
- **Avg Bits/Weight:** 4.01 bpw
- **Target VRAM:** 16 GB GPUs

---

## Architecture-Aware Quant Strategy

Qwen3.6-27B is a **hybrid Mamba/Transformer** model. Not all layers serve the same purpose, and not all tensors tolerate quantization equally. This layout respects the architecture by:

1. **Protecting pure-attention layers** (`blk.3,7,11...63`) with higher precision for global reasoning and long-range focus.
2. **Compressing SSM-dominated hybrid layers** aggressively where the recurrent state carries the sequential load.
3. **Preserving critical routing & projection tensors** at native or near-native precision to prevent error compounding.
4. **Downgrading resilient tensors** (embeddings, FFN gate/up) where KLD sensitivity is flat and quality loss is imperceptible.

---

## Benchmark Summary (WikiText-2, 580 chunks)

| Metric | This | sokann (4.256 bpw) | bartowski Q3_K_M | mradermacher i1.IQ4_XS | bartowski IQ4_XS |
|--------|------|-------------------|------------------|------------------------|------------------|
| **Size (BPW)** | 4.01 | 4.256 | 4.270 | 4.483 | 4.556 |
| **Size (GiB)** | 12.576 | 13.327 | 13.370 | 14.036 | 14.266 |
| **Mean PPL(Q)** | 7.128552 ± 0.046785 | 7.098696 ± 0.047344 | 6.993009 ± 0.046208 | 7.020660 ± 0.046587 | 6.996323 ± 0.046332 |
| **Mean PPL(base)** | 6.900925 ± 0.045382 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 |
| **Cor(ln(PPL(Q)), ln(PPL(base)))** | 98.76% | 99.19% | 98.52% | 99.30% | 99.32% |
| **Mean KLD** | 0.049767 ± 0.000745 | 0.033452 ± 0.000723 | 0.058818 ± 0.000881 | 0.046348 ± 0.000841 | 0.026270 ± 0.000653 |
| **Maximum KLD** | 22.599598 | 23.255085 | 24.616274 | 24.175169 | 22.992002 |
| **99.9% KLD** | 3.240748 | 2.907350 | 3.986622 | 3.614290 | 2.385293 |
| **RMS Δp** | 6.167 ± 0.054 % | 4.936 ± 0.054 % | 6.690 ± 0.059 % | 5.867 ± 0.060 % | 4.352 ± 0.057 % |
| **Same top p** | 91.146 ± 0.074 % | 92.427 ± 0.069 % | 90.350 ± 0.077 % | 93.903 ± 0.062 % | 93.888 ± 0.062 % |

- **Efficiency-First Parity:** Achieves competitive quality at ~6% smaller size — PPL(Q) within 0.4% of `sokann` (4.256 bpw) and on par with `bartowski Q3_K_M` on KLD, all while saving ~750 MB of VRAM for larger context or higher batch sizes.

---

## Acknowledgments

- Special thanks to **unsloth** for their 9 TB of Qwen3.5 GGUF Benchmarks, which were instrumental in selecting the optimal quantization strategy for this model.
- Thanks to **bartowski** for providing the calibration data used in this process.