--- base_model: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 base_model_relation: quantized license: apache-2.0 language: - en tags: - gguf - imatrix - quantized - merge - mergekit - qwen3_5 - reasoning - code pipeline_tag: image-text-to-text library_name: gguf --- # Qwen3.6-27B-Omnimerge-v4-GGUF GGUF quantizations of [`ManniX-ITA/Qwen3.6-27B-Omnimerge-v4`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) — the **MLP-passthrough** variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 → IQ2_XXS) for `llama.cpp`. > **Source model:** [`ManniX-ITA/Qwen3.6-27B-Omnimerge-v4`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) (BF16 weights, model card with full benchmarks and methodology). > **NOT** a quant of clean Qwen/Qwen3.6-27B — these GGUFs contain the v4 merge. All quants made using imatrix with [calibration data v5](https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d), the same calibration set bartowski uses for the Qwen3.6 base release — so quality fingerprints are directly comparable to bartowski's `Qwen_Qwen3.6-27B-GGUF` repo. ## Why this merge exists Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to [`ManniX-ITA/Qwen3.5-27B-Omnimerge-v2`](https://huggingface.co/ManniX-ITA/Qwen3.5-27B-Omnimerge-v2) on the newer Qwen3.6 base, with `mlp.{gate,up,down}_proj` copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the [v4 model card](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) for the full story, scripts, and benchmark methodology. ## Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2) All scored under identical llama.cpp + lm_eval conditions (`--reasoning-format deepseek --reasoning-budget 8192 --parallel 2`, raw `/v1/completions`, no chat template). | Benchmark | Qwen3.6 base Q6_K (bartowski) | Omnimerge-v2 (Qwen3.5 base) | **Omnimerge-v4-MLP (this)** | Δ vs base | Δ vs v2 | |---|---|---|---|---|---| | HumanEval pass@1 (164q) | **84.76%** | 79.27% | **84.76%** | **0.00 pp** | **+5.49 pp** | | MBPP pass@1 (500q) — corrected\* | 57.60% | 74.60% | **73.40%** | **+15.80 pp** | −1.20 pp | | GPQA Diamond pass@1 (flex) | not measured | 69.19% (full 198q) | **≈ 84.75%** (partial 177q‡) | — | **≈ +15.5 pp** | \* MBPP scores are post-``-stripping (lm_eval's raw scorer SyntaxErrors on literal `<` in `exec(prompt+completion+tests)`). See the [v4 model card](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) for the per-model recovery breakdown. ‡ GPQA crashed on the at-budget reasoning tail (aiohttp lifecycle bug in lm_eval); 192/198 cached, 177 matched, headline expected to land in the 82-86% band. ## Available Quantizations All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. `imatrix.dat` (used for every quant) is in the repo root for audit and reproduction. | Quantization | File size | Use case | |---|---|---| | F16 (full precision) | 50.11 GB | Conversion source / lossless reference | | Q8_0 | 26.63 GB | Highest fidelity, large | | Q6_K_L | 21.14 GB | Q6_K with embed/output at Q8_0 | | Q6_K | 20.57 GB | **Recommended high tier** — eval methodology used this | | Q5_K_L | 18.64 GB | Q5_K_M with embed/output at Q8_0 | | Q5_K_M | 17.91 GB | Strong fidelity, balanced | | Q5_K_S | 17.40 GB | Slightly smaller K-mix | | Q4_K_L | 16.29 GB | Q4_K_M with embed/output at Q8_0 | | Q4_1 | 15.91 GB | Legacy 4-bit, dense | | Q4_K_M | 15.41 GB | **Recommended balanced tier** for most users | | IQ4_NL | 14.72 GB | Importance-aware 4-bit non-linear | | Q4_K_S | 14.52 GB | K-mix small variant | | Q4_0 | 14.41 GB | Legacy 4-bit | | IQ4_XS | 14.05 GB | IQ4 extra-small | | Q3_K_XL | 13.42 GB | Q3_K_L with embed/output at Q8_0 | | Q3_K_L | 13.36 GB | 3-bit K-mix large | | Q3_K_M | 12.39 GB | 3-bit K-mix medium | | IQ3_M | 11.72 GB | Importance-aware 3-bit medium | | Q3_K_S | 11.24 GB | 3-bit K-mix small | | IQ3_XS | 11.15 GB | IQ3 extra-small | | Q2_K_L | 11.13 GB | Q2_K with embed/output at Q8_0 | | IQ3_XXS | 10.42 GB | IQ3 extra-extra-small | | Q2_K | 9.98 GB | 2-bit K-mix | | IQ2_M | 9.32 GB | Importance-aware 2-bit medium | | IQ2_S | 8.72 GB | IQ2 small | | IQ2_XS | 8.47 GB | IQ2 extra-small | | IQ2_XXS | 7.85 GB | IQ2 extra-extra-small (smallest) | ## How to Use With [llama.cpp](https://github.com/ggml-org/llama.cpp): ```bash # Recommended args for reasoning-tag-emitting models (matches the eval methodology): llama-server \ -m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \ -c 32768 -ngl 99 -t 12 --no-warmup \ --reasoning-format deepseek --reasoning-budget 8192 ``` Swap `Q4_K_M` for any tier from the table above. **`Q6_K`** matches the methodology used in our published evals; **`Q4_K_M`** is the typical "balanced" choice for most users. For multimodal (vision) inference: the `mmproj` projector is in [`bartowski/Qwen_Qwen3.6-27B-GGUF`](https://huggingface.co/bartowski/Qwen_Qwen3.6-27B-GGUF) and works with this model unchanged (vision tower is preserved verbatim from the base). With [ollama](https://ollama.ai): use a Modelfile pointing to one of the GGUFs above, or HF direct load. ## imatrix.dat The `imatrix.dat` (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable. ## Reproducing See [`scripts/`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4/tree/main/scripts) on the source v4 model repo: - `dare_ties_merge.py` — main merger (auto-detects Qwen3.6 base via `output_gate_type` and applies MLP-skip) - `v4_mlp_passthrough.py` — post-process: rebuild merged dir with MLP layers from base - `quantize_gguf.py` — the script that built this repo For dense (non-Gemma-4-MoE) models, pass `--exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K` to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps). ## License Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources). ## Acknowledgements - [Qwen team](https://huggingface.co/Qwen) for the Qwen3.6 base - [rico03](https://huggingface.co/rico03), [ValiantLabs](https://huggingface.co/ValiantLabs), [kai-os](https://huggingface.co/kai-os) for the fine-tunes - [bartowski](https://huggingface.co/bartowski) for the calibration_datav5.txt set used here - DARE / TIES / DARE-TIES authors and the [arcee-ai/mergekit](https://github.com/arcee-ai/mergekit) community