Image-Text-to-Text
GGUF
English
imatrix
quantized
Merge
mergekit
qwen3_5
reasoning
code
conversational
Instructions to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF", filename="Qwen3.6-27B-Omnimerge-v4-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Use Docker
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Ollama
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Ollama:
ollama run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Unsloth Studio
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF to start chatting
- Pi
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Docker Model Runner:
docker model run hf.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
- Lemonade
How to use ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.6-27B-Omnimerge-v4-GGUF-Q4_K_M
List all available models
lemonade list
File size: 7,895 Bytes
2ff27d5 e1217ff 2ff27d5 e1217ff 2ff27d5 79cb374 2ff27d5 e1217ff 504b433 e1217ff 1d0c680 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 1d0c680 504b433 e1217ff 1d0c680 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 63a5fdf e1217ff 63a5fdf e1217ff 63a5fdf e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff 504b433 e1217ff | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | ---
base_model: ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
base_model_relation: quantized
license: apache-2.0
language:
- en
tags:
- gguf
- imatrix
- quantized
- merge
- mergekit
- qwen3_5
- reasoning
- code
pipeline_tag: image-text-to-text
library_name: gguf
---
# Qwen3.6-27B-Omnimerge-v4-GGUF
GGUF quantizations of [`ManniX-ITA/Qwen3.6-27B-Omnimerge-v4`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) β the **MLP-passthrough** variant that defends against the Qwen3.6 think-policy fragility we discovered. Source dtype is BF16; this repo provides the standard bartowski quant ladder (F16 β IQ2_XXS) for `llama.cpp`.
> **Source model:** [`ManniX-ITA/Qwen3.6-27B-Omnimerge-v4`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) (BF16 weights, model card with full benchmarks and methodology).
> **NOT** a quant of clean Qwen/Qwen3.6-27B β these GGUFs contain the v4 merge.
>
> **MTP companion (2Γ decode speedup):** weight-identical GGUFs with the MTP head retained for `llama.cpp --spec-type draft-mtp` self-speculative decoding are at [`ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4-MTP-GGUF). Quality is statistically indistinguishable from this repo (HE 137/164 β 137/164, GPQA 155/198 β 154/198); aggregate decode is 2.0-2.3 Γ faster on a single 24 GB GPU. Use that repo for interactive / single-request workloads where latency matters.
All quants made using imatrix with [calibration data v5](https://gist.github.com/bartowski1182/82ae9b520227f57d79ba04add13d0d0d), the same calibration set bartowski uses for the Qwen3.6 base release β so quality fingerprints are directly comparable to bartowski's `Qwen_Qwen3.6-27B-GGUF` repo.
## Why this merge exists
Same-base DARE-TIES (Omnimerge_v2 method) merge of Qwen/Qwen3.6-27B + 3 Qwen3.6 fine-tunes. Direct successor to [`ManniX-ITA/Qwen3.5-27B-Omnimerge-v2`](https://huggingface.co/ManniX-ITA/Qwen3.5-27B-Omnimerge-v2) on the newer Qwen3.6 base, with `mlp.{gate,up,down}_proj` copied verbatim from clean Qwen3.6 (the "MLP-passthrough" surgery) to defend against a Qwen3.6-specific reasoning-tag fragility we found during forensic delta inspection. See the [v4 model card](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) for the full story, scripts, and benchmark methodology.
## Benchmark headline (Q6_K, head-to-head vs Qwen3.6 base + Omnimerge-v2)
All scored under identical llama.cpp + lm_eval conditions (`--reasoning-format deepseek --reasoning-budget 8192 --parallel 2`, raw `/v1/completions`, no chat template).
| Benchmark | Qwen3.6 base Q6_K (bartowski) | Omnimerge-v2 (Qwen3.5 base) | **Omnimerge-v4-MLP (this)** | Ξ vs base | Ξ vs v2 |
|---|---|---|---|---|---|
| HumanEval pass@1 (164q) | **84.76%** | 79.27% | **83.54%** (137/164) | β1.22 pp | **+4.27 pp** |
| MBPP pass@1 (500q) β corrected\* | 57.60% | 74.60% | **73.00%** (365/500) | **+15.40 pp** | β1.60 pp |
| GPQA Diamond pass@1 (flex) β full greedyΒ§ | not measured | 69.19% (full 198q) | **78.28%** (155/198) | β | **+9.09 pp** |
\* MBPP scores are post-`<think>`-stripping (lm_eval's raw scorer SyntaxErrors on literal `<` in `exec(prompt+completion+tests)`). See the [v4 model card](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4) for the per-model recovery breakdown.
Β§ **Canonical full-198q greedy GPQA result** measured 2026-05-22 on pod 37268930 (Vast.ai 3090) with the patched eval chain (lm-eval 0.4.11 + `max_length=32768` override + the api_models.py:545 UnboundLocalError patch + aiohttp lifecycle workaround). Sampler: `do_sample=False, temperature=0.0`, `max_gen_toks=8192`. Wall time 4 h 55 min. Companion strict-match (rigid `Answer: X` template) is 7.58 % β the model emits CoT verbosely rather than the strict template, so flex is the real quality signal. Earlier card revisions reported an `β 84.75 %` partial result (177/198 sampled at `T=0.6`, `budget=16384`); that number is superseded by this canonical greedy measurement on the full bench β the 6.5 pp difference is driven by the methodology change (sampler / budget / completeness), not by a model change.
## Available Quantizations
All 27 files (F16 + 26 imatrix-quantized tiers, ~417 GB total) are uploaded and ready. `imatrix.dat` (used for every quant) is in the repo root for audit and reproduction.
| Quantization | File size | Use case |
|---|---|---|
| F16 (full precision) | 50.11 GB | Conversion source / lossless reference |
| Q8_0 | 26.63 GB | Highest fidelity, large |
| Q6_K_L | 21.14 GB | Q6_K with embed/output at Q8_0 |
| Q6_K | 20.57 GB | **Recommended high tier** β eval methodology used this |
| Q5_K_L | 18.64 GB | Q5_K_M with embed/output at Q8_0 |
| Q5_K_M | 17.91 GB | Strong fidelity, balanced |
| Q5_K_S | 17.40 GB | Slightly smaller K-mix |
| Q4_K_L | 16.29 GB | Q4_K_M with embed/output at Q8_0 |
| Q4_1 | 15.91 GB | Legacy 4-bit, dense |
| Q4_K_M | 15.41 GB | **Recommended balanced tier** for most users |
| IQ4_NL | 14.72 GB | Importance-aware 4-bit non-linear |
| Q4_K_S | 14.52 GB | K-mix small variant |
| Q4_0 | 14.41 GB | Legacy 4-bit |
| IQ4_XS | 14.05 GB | IQ4 extra-small |
| Q3_K_XL | 13.42 GB | Q3_K_L with embed/output at Q8_0 |
| Q3_K_L | 13.36 GB | 3-bit K-mix large |
| Q3_K_M | 12.39 GB | 3-bit K-mix medium |
| IQ3_M | 11.72 GB | Importance-aware 3-bit medium |
| Q3_K_S | 11.24 GB | 3-bit K-mix small |
| IQ3_XS | 11.15 GB | IQ3 extra-small |
| Q2_K_L | 11.13 GB | Q2_K with embed/output at Q8_0 |
| IQ3_XXS | 10.42 GB | IQ3 extra-extra-small |
| Q2_K | 9.98 GB | 2-bit K-mix |
| IQ2_M | 9.32 GB | Importance-aware 2-bit medium |
| IQ2_S | 8.72 GB | IQ2 small |
| IQ2_XS | 8.47 GB | IQ2 extra-small |
| IQ2_XXS | 7.85 GB | IQ2 extra-extra-small (smallest) |
## How to Use
With [llama.cpp](https://github.com/ggml-org/llama.cpp):
```bash
# Recommended args for reasoning-tag-emitting models (matches the eval methodology):
llama-server \
-m Qwen3.6-27B-Omnimerge-v4-Q4_K_M.gguf \
-c 32768 -ngl 99 -t 12 --no-warmup \
--reasoning-format deepseek --reasoning-budget 8192
```
Swap `Q4_K_M` for any tier from the table above. **`Q6_K`** matches the methodology used in our published evals; **`Q4_K_M`** is the typical "balanced" choice for most users.
For multimodal (vision) inference: the `mmproj` projector is in [`bartowski/Qwen_Qwen3.6-27B-GGUF`](https://huggingface.co/bartowski/Qwen_Qwen3.6-27B-GGUF) and works with this model unchanged (vision tower is preserved verbatim from the base).
With [ollama](https://ollama.ai): use a Modelfile pointing to one of the GGUFs above, or HF direct load.
## imatrix.dat
The `imatrix.dat` (~14 MB) used to generate every quant in this repo is uploaded alongside the GGUFs at the repo root. Reproducible, auditable.
## Reproducing
See [`scripts/`](https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4/tree/main/scripts) on the source v4 model repo:
- `dare_ties_merge.py` β main merger (auto-detects Qwen3.6 base via `output_gate_type` and applies MLP-skip)
- `v4_mlp_passthrough.py` β post-process: rebuild merged dir with MLP layers from base
- `quantize_gguf.py` β the script that built this repo
For dense (non-Gemma-4-MoE) models, pass `--exclude CD-Q6_K,CD-Q5_K_M,CD-Q4_K_M,CD-Q3_K_M,CD-Q2_K` to skip ContribDynamic tiers (those require Gemma 4 expert-contribution maps).
## License
Apache-2.0 (inherited from Qwen/Qwen3.6-27B and the fine-tune sources).
## Acknowledgements
- [Qwen team](https://huggingface.co/Qwen) for the Qwen3.6 base
- [rico03](https://huggingface.co/rico03), [ValiantLabs](https://huggingface.co/ValiantLabs), [kai-os](https://huggingface.co/kai-os) for the fine-tunes
- [bartowski](https://huggingface.co/bartowski) for the calibration_datav5.txt set used here
- DARE / TIES / DARE-TIES authors and the [arcee-ai/mergekit](https://github.com/arcee-ai/mergekit) community
|