Qwen3.6-35B-A3B — IQ2_M (domain imatrix)

Custom IQ2_M quantization of Qwen/Qwen3.6-35B-A3B with a domain-mixed importance matrix calibrated on code + agentic + CLI traces.

  • Size: 11.1 GB
  • BPW: 2.69
  • Architecture: Hybrid GatedDeltaNet + MoE (256 experts, 8+1 active per token, 35B total / 3B active)
  • Calibration: domain-mixed imatrix (45% code, 45% agentic/tool-use, 10% general)

Benchmark comparison — Cross-model, cross-architecture

First public HumanEval+/MBPP+/BFCL evaluation of Qwen3.6-35B-A3B quantizations.

Note on BFCL scores: All BFCL scores reported here use our internal simplified evaluation (single-function-call subset with custom prompt/scoring), NOT the official Berkeley Function Calling Leaderboard methodology. Our scores are not directly comparable to the official leaderboard. We are working on running the official BFCL evaluation for comparable numbers.

Model Size Active HumanEval+ MBPP+ BFCL v3 NL2Bash F1
Gemma 4 31B IQ2_M (sibling) 10.4 GB 31B 88.41% 82.01% 92.25% 84.71%
Qwen3.6 Q8_0 (≈f16 baseline) 35.2 GB 3B 81.10% 82.80% 95.25%
Qwen3.6 Unsloth UD-IQ2_M 11.0 GB 3B 82.32% 79.37% 94.25%
Qwen3.6 IQ2_M (this repo) 11.1 GB 3B 80.49% 78.31% 94.75% 81.63%
Gemma 4 E4B Q8_0 7.8 GB 4.5B 73.78% 73.28% 93.75% 79.75%
Gemma 4 31B IQ1_M 7.4 GB 31B 21.34% 40.50% 86.75% 53.89%

Key findings

  • This repo beats Unsloth UD-IQ2_M on BFCL (+0.5pt, 94.75 vs 94.25) — domain-mixed imatrix calibration with 45% agentic traces pays off for tool-calling
  • Gemma 4 31B IQ2_M remains king for pure code at 10 GB — HumanEval+ 88.41% is unmatched
  • Qwen3.6 MoE efficiency is remarkable: 3B active params at 11 GB scores 94.75% BFCL, competing with 31B dense models
  • Sub-8 GB tier: Gemma 4 E4B Q8_0 (7.8 GB) dominates over the 31B IQ1_M (7.4 GB) on every metric — ultra-aggressive quantization of large models loses to small models at full precision

Recommendation by use case

  • Best for code (any size): Gemma 4 31B IQ2_M (10.4 GB)
  • Best for agentic/tool-calling: This repo — Qwen3.6 IQ2_M (11.1 GB)
  • Best sub-8 GB all-rounder: Gemma 4 E4B Q8_0 (7.8 GB)

Quickstart

huggingface-cli download KikoCis/Qwen3.6-35B-A3B-IQ2_M-GGUF qwen36-35b-a3b-IQ2_M.gguf --local-dir .

llama-cli -m qwen36-35b-a3b-IQ2_M.gguf -ngl 99 --ctx-size 8192 --temp 0.1 \
  -p "Write a Python function to find the longest palindromic substring"

Files

  • qwen36-35b-a3b-IQ2_M.gguf — quantized weights (11.1 GB)
  • qwen36-35b-a3b-domain.imatrix — importance matrix used for calibration

Related

License

Apache 2.0.


Real-World Agent Test Warning (April 2026)

Benchmark scores do not predict agent capability. In Docker-based autonomous testing, fine-tuned E4B models (95% BFCL) scored 0/10 while the unfine-tuned base scored 6/10. Fine-tuning for BFCL destroyed general reasoning (error recovery, strategy adaptation, anti-repetition). Fine-tuned E4B models have been withdrawn.

For autonomous agent tasks, use the base Gemma 4 model or a larger model at higher BPW. See: The Benchmark Trap — Full Study

Downloads last month
550
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KikoCis/Qwen3.6-35B-A3B-IQ2_M-GGUF

Quantized
(510)
this model