Qwen3.5-9B-MTP-SWE-Agent-GGUF

Agent architecture

A 9B Qwen3.5 merge tuned for SWE-agent style workflows: multi-turn tool use, debugging, structured code generation, and reasoning-heavy instruction following. The model keeps MTP draft layers for speculative decoding in llama.cpp.

Overview

Item	Value
Model family	Qwen3.5-9B MTP merge
Training focus	SWE workflows, tool calling, concise instruction following, reasoning traces
Primary runtime	llama.cpp OpenAI-compatible API
Recommended quant	`Qwen3.5-9B-MTP-SWE-Agent-GGUF-Q4_K_M.gguf`
Size target	9B class
Typical use	Agentic coding, debugging, tool planning, structured outputs

Benchmark Snapshot

Measured locally against a live llama.cpp server with temperature 0.

Metric	Result
Tests	93 / 93 passed
Pass rate	100.0%
Weighted score	100.0%
Avg latency	1.41 s
Median latency	0.95 s
Avg generation speed	95.2 tok/s

Category Breakdown

Category	Tests	Passed	Pass %	Score %	Avg latency	Avg gen tok/s
Debug	15	15	100.0	100.0	2.54 s	94.1
Tool plan	12	12	100.0	100.0	0.60 s	96.3
Tool call	15	15	100.0	100.0	0.39 s	96.2
Code fix	15	15	100.0	100.0	2.14 s	94.5
Workflow	9	9	100.0	100.0	1.49 s	94.8
Discipline	12	12	100.0	100.0	0.58 s	96.6
Patch	6	6	100.0	100.0	3.00 s	94.3
Reasoning	9	9	100.0	100.0	1.00 s	94.6

Capability Matrix

Capability	Score
Algorithm implementation	100.0%
Complexity analysis	100.0%
Concurrency debugging	100.0%
Config inspection	100.0%
Defensive None-guard	100.0%
Dependency debugging	100.0%
Exception handling	100.0%
Format compliance (no markdown)	100.0%
Git knowledge	100.0%
Incident analysis	100.0%
Incident response	100.0%
Instruction following (short reply)	100.0%
Memory profiling knowledge	100.0%
No thinking tag leak	100.0%
Off-by-one fix	100.0%
PR workflow knowledge	100.0%
Patch generation	100.0%
Patch generation (docstring)	100.0%
Python knowledge	100.0%
Refactor planning	100.0%
Root-cause analysis	100.0%
Security fix (SQL injection)	100.0%
Security review knowledge	100.0%
Test execution planning	100.0%
Token limit following	100.0%
Tool call – exec_shell_command	100.0%
Tool call – grep_search	100.0%
Tool call – list_directory	100.0%
Tool call – read_file	100.0%
Tool call – write_file	100.0%
Tool-use planning	100.0%

What the Benchmark Covers

Area	Examples
Debugging	NoneType errors, connection pools, missing dependencies, race conditions, memory leaks
Tool planning	`grep_search`, `read_file`, `write_file`, `exec_shell_command`, `list_directory`
Tool calls	Structured OpenAI-style function calls with argument validation
Code repair	Python bug fixes, guards, binary search, SQL injection mitigation, exception wrapping
Workflow	PR checklists, incident response, code review checklists
Discipline	Exact replies, no fake turns, no markdown, token-limit compliance
Patch literacy	Unified diff generation and docstring edits
Reasoning	Complexity analysis and conflict resolution

Representative SWE / Agentic Cases

ID	What it validates
`swe_debug_plan`	Numbered debug plan for a `NoneType.get` error on `auth.py:42`
`swe_pool_exhausted`	Root cause and remediation for connection pool exhaustion
`swe_missing_module`	Fix workflow for `ModuleNotFoundError: requests`
`agent_tool_plan`	Ordered multi-tool plan using repo search and file reads
`tool_read`	Correct `read_file` tool call
`tool_grep`	Correct `grep_search` tool call
`tool_pytest`	Correct `exec_shell_command` tool call

GGUF Files

Quantization	File
Q4_K_M	`Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q4_K_M.gguf`
Q8_0	`Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q8_0.gguf`
BF16	`Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16.gguf`
mmproj	`Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16-mmproj.gguf`

Training Mix

Dataset	Weight	Purpose
nebius/SWE-agent-trajectories	35%	Real SWE agent traces
vsamuel/verbosity-control-training	22%	Conciseness control
teknium/OpenHermes-2.5	20%	Instruction quality
Jackrong/Claude-opus-4.7-TraceInversion-5000x	8%	Reasoning traces
Jackrong/Claude-opus-4.6-TraceInversion-9000x	7%	Reasoning traces

Notes

Topic	Note
Benchmark style	Local API runs with fixed prompts and deterministic decoding
Output handling	Some backends split `content` and `reasoning_content`; clients should merge carefully if needed
Safety	Generated code should be reviewed before execution
SWE-bench	This page describes the project’s local benchmark suite, not the SWE-bench Verified leaderboard

Links

Resource	Link
Model repository	https://huggingface.co/raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF
llama.cpp	https://github.com/ggerganov/llama.cpp
SWE-bench	https://www.swebench.com/

Downloads last month: 902

GGUF

Hardware compatibility

4-bit

8-bit

16-bit

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

trohrbaugh/Qwen3.5-9B-heretic-v2

Quantized

Crownelius/Crow-9B-HERETIC-4.6