--- base_model: Qwen/Qwen3.6-27B base_model_relation: quantized license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/LICENSE --- # Qwen3.6-27B Hybrid-Optimized Quantization for 16 GB of VRAM At [ggufbench.com](https://ggufbench.com), we are always looking to advance local AI on average consumer hardware. We would love it if you took a minute to submit your performance results and launch arguments for this model, and others, on our website! ## Quick Specs - **File Size:** 12.576 GiB - **Avg Bits/Weight:** 4.01 bpw - **Target VRAM:** 16 GB GPUs --- ## Architecture-Aware Quant Strategy Qwen3.6-27B is a **hybrid Mamba/Transformer** model. Not all layers serve the same purpose, and not all tensors tolerate quantization equally. This layout respects the architecture by: 1. **Protecting pure-attention layers** (`blk.3,7,11...63`) with higher precision for global reasoning and long-range focus. 2. **Compressing SSM-dominated hybrid layers** aggressively where the recurrent state carries the sequential load. 3. **Preserving critical routing & projection tensors** at native or near-native precision to prevent error compounding. 4. **Downgrading resilient tensors** (embeddings, FFN gate/up) where KLD sensitivity is flat and quality loss is imperceptible. --- ## Benchmark Summary (WikiText-2, 580 chunks) | Metric | This | sokann (4.256 bpw) | bartowski Q3_K_M | mradermacher i1.IQ4_XS | bartowski IQ4_XS | |--------|------|-------------------|------------------|------------------------|------------------| | **Size (BPW)** | 4.01 | 4.256 | 4.270 | 4.483 | 4.556 | | **Size (GiB)** | 12.576 | 13.327 | 13.370 | 14.036 | 14.266 | | **Mean PPL(Q)** | 7.128552 ± 0.046785 | 7.098696 ± 0.047344 | 6.993009 ± 0.046208 | 7.020660 ± 0.046587 | 6.996323 ± 0.046332 | | **Mean PPL(base)** | 6.900925 ± 0.045382 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 | 6.908506 ± 0.045543 | | **Cor(ln(PPL(Q)), ln(PPL(base)))** | 98.76% | 99.19% | 98.52% | 99.30% | 99.32% | | **Mean KLD** | 0.049767 ± 0.000745 | 0.033452 ± 0.000723 | 0.058818 ± 0.000881 | 0.046348 ± 0.000841 | 0.026270 ± 0.000653 | | **Maximum KLD** | 22.599598 | 23.255085 | 24.616274 | 24.175169 | 22.992002 | | **99.9% KLD** | 3.240748 | 2.907350 | 3.986622 | 3.614290 | 2.385293 | | **RMS Δp** | 6.167 ± 0.054 % | 4.936 ± 0.054 % | 6.690 ± 0.059 % | 5.867 ± 0.060 % | 4.352 ± 0.057 % | | **Same top p** | 91.146 ± 0.074 % | 92.427 ± 0.069 % | 90.350 ± 0.077 % | 93.903 ± 0.062 % | 93.888 ± 0.062 % | - **Efficiency-First Parity:** Achieves competitive quality at ~6% smaller size — PPL(Q) within 0.4% of `sokann` (4.256 bpw) and on par with `bartowski Q3_K_M` on KLD, all while saving ~750 MB of VRAM for larger context or higher batch sizes. --- ## Acknowledgments - Special thanks to **unsloth** for their 9 TB of Qwen3.5 GGUF Benchmarks, which were instrumental in selecting the optimal quantization strategy for this model. - Thanks to **bartowski** for providing the calibration data used in this process.