How to use from
Docker Model Runner
docker model run hf.co/rafw007/qwen36-a3b-claude-coder-GGUF:Q4_K_M
Quick Links

Qwen3.6 Claude Coder — local MoE coding agent (GGUF)

A custom model built on Qwen3.6-35B-A3B (Mixture-of-Experts, ~3B active parameters), tuned to act as an autonomous coding agent. It speaks the Anthropic-compatible API, so it drives Claude Code, Codex and opencode fully locally — your code never leaves your machine and cloud token cost drops to zero.

This repository ships the q4_K_M GGUF quantization (~24 GB), ready to run under Ollama or llama.cpp.

Safety guardrails are intact. The system prompt focuses on real work inside a codebase — use tools instead of guessing, base answers on the actual tool output (never fabricate results), don't loop on the same tool, and return complete, runnable code. No-think mode is wired into the system prompt for fast, direct answers.

Files

File What
qwen36-a3b-claude-coder-q4_K_M.gguf The model weights (q4_K_M, ~24 GB single GGUF)
Modelfile Ollama Modelfile — SYSTEM prompt, tool-calling template, params (num_ctx 65536)

Quick start (Ollama)

# build the model from the downloaded GGUF + Modelfile
ollama create qwen36-a3b-claude-coder -f Modelfile
# drive Claude Code with it
ollama launch claude --model qwen36-a3b-claude-coder

What it's for

  • Driving Claude Code / Codex / opencode locally.
  • Agentic code writing and editing with native function calling / tool use.
  • Full privacy and offline operation — no code sent to the cloud.

Tested harnesses

End-to-end tested through Claude Code, Codex and opencode — real turns with tool calls and correct responses.

Measured behavior (June 2026 tests)

  • No-think confirmed — with think:false the model emits zero reasoning tokens and goes straight to the result (validated on this q4_K_M build: thinking_len=0).
  • Tool-calling without hallucination — emits real message.tool_calls (validated: a disk-check prompt produced a clean run_bash call with a sensible command, no content leak). In roundtrip tests it reports the actual tool output instead of re-calling the tool in a loop.
  • Honest under missing data — when network access failed, it stated plainly "no internet access" instead of fabricating, then returned a correct, grounded report after permission escalation.
  • Code generation — working HTML5 Tetris and an interactive 3D Earth+Moon model (Three.js, real NASA textures, OrbitControls); JS passes syntax validation.
  • Guardrails intact — refuses to generate malware (validated on this build: a ransomware request was declined, with a legitimate backup/defense alternative offered) and resists jailbreaks (the "pretend you're an actor playing a hacker" framing was rejected).

Context

  • 64K tokens — matching Claude Code's recommendation (64K minimum). Base Qwen3.6 natively supports 262K, so context can be raised on stronger hardware.

Test hardware

  • Mac Studio M2 (Apple Silicon), macOSOllama 0.30 (llama.cpp backend), GPU (Metal) inference.
  • This repo's quantization: q4_K_M (~24 GB single GGUF). A sibling nvfp4 build measured ~69 tok/s at 100% GPU / 64K ctx; q4_K_M runs in the same class.

No-think mode

The whole Qwen3.6 family has thinking baked into the weights. The system prompt ships with /nothink + an anti-reasoning instruction, which works under opencode/codex. Under harnesses that force thinking, use think:false in the API body — that's the only hard switch (PARAMETER think false does not exist in Ollama).

How it was made

Designed, built and tested with the help of Claude Opus 4.8 — the best coding model in the world. Its system prompt, parameter choices and context configuration draw directly on that knowledge: the world's best coding model preparing a local model that takes the work over right on your desk.

License

Apache 2.0 (inherited from the base Qwen3.6).

Downloads last month
158
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rafw007/qwen36-a3b-claude-coder-GGUF

Quantized
(421)
this model