Qwen3.5-9B-MTP-SWE-Agent-GGUF

A 9B Qwen3.5 merge tuned for SWE-agent style workflows: multi-turn tool use, debugging, structured code generation, and reasoning-heavy instruction following. The model keeps MTP draft layers for speculative decoding in llama.cpp.
Overview
| Item |
Value |
| Model family |
Qwen3.5-9B MTP merge |
| Training focus |
SWE workflows, tool calling, concise instruction following, reasoning traces |
| Primary runtime |
llama.cpp OpenAI-compatible API |
| Recommended quant |
Qwen3.5-9B-MTP-SWE-Agent-GGUF-Q4_K_M.gguf |
| Size target |
9B class |
| Typical use |
Agentic coding, debugging, tool planning, structured outputs |
Benchmark Snapshot
Measured locally against a live llama.cpp server with temperature 0.
| Metric |
Result |
| Tests |
93 / 93 passed |
| Pass rate |
100.0% |
| Weighted score |
100.0% |
| Avg latency |
1.41 s |
| Median latency |
0.95 s |
| Avg generation speed |
95.2 tok/s |
Category Breakdown
| Category |
Tests |
Passed |
Pass % |
Score % |
Avg latency |
Avg gen tok/s |
| Debug |
15 |
15 |
100.0 |
100.0 |
2.54 s |
94.1 |
| Tool plan |
12 |
12 |
100.0 |
100.0 |
0.60 s |
96.3 |
| Tool call |
15 |
15 |
100.0 |
100.0 |
0.39 s |
96.2 |
| Code fix |
15 |
15 |
100.0 |
100.0 |
2.14 s |
94.5 |
| Workflow |
9 |
9 |
100.0 |
100.0 |
1.49 s |
94.8 |
| Discipline |
12 |
12 |
100.0 |
100.0 |
0.58 s |
96.6 |
| Patch |
6 |
6 |
100.0 |
100.0 |
3.00 s |
94.3 |
| Reasoning |
9 |
9 |
100.0 |
100.0 |
1.00 s |
94.6 |
Capability Matrix
| Capability |
Score |
| Algorithm implementation |
100.0% |
| Complexity analysis |
100.0% |
| Concurrency debugging |
100.0% |
| Config inspection |
100.0% |
| Defensive None-guard |
100.0% |
| Dependency debugging |
100.0% |
| Exception handling |
100.0% |
| Format compliance (no markdown) |
100.0% |
| Git knowledge |
100.0% |
| Incident analysis |
100.0% |
| Incident response |
100.0% |
| Instruction following (short reply) |
100.0% |
| Memory profiling knowledge |
100.0% |
| No thinking tag leak |
100.0% |
| Off-by-one fix |
100.0% |
| PR workflow knowledge |
100.0% |
| Patch generation |
100.0% |
| Patch generation (docstring) |
100.0% |
| Python knowledge |
100.0% |
| Refactor planning |
100.0% |
| Root-cause analysis |
100.0% |
| Security fix (SQL injection) |
100.0% |
| Security review knowledge |
100.0% |
| Test execution planning |
100.0% |
| Token limit following |
100.0% |
| Tool call – exec_shell_command |
100.0% |
| Tool call – grep_search |
100.0% |
| Tool call – list_directory |
100.0% |
| Tool call – read_file |
100.0% |
| Tool call – write_file |
100.0% |
| Tool-use planning |
100.0% |
What the Benchmark Covers
| Area |
Examples |
| Debugging |
NoneType errors, connection pools, missing dependencies, race conditions, memory leaks |
| Tool planning |
grep_search, read_file, write_file, exec_shell_command, list_directory |
| Tool calls |
Structured OpenAI-style function calls with argument validation |
| Code repair |
Python bug fixes, guards, binary search, SQL injection mitigation, exception wrapping |
| Workflow |
PR checklists, incident response, code review checklists |
| Discipline |
Exact replies, no fake turns, no markdown, token-limit compliance |
| Patch literacy |
Unified diff generation and docstring edits |
| Reasoning |
Complexity analysis and conflict resolution |
Representative SWE / Agentic Cases
| ID |
What it validates |
swe_debug_plan |
Numbered debug plan for a NoneType.get error on auth.py:42 |
swe_pool_exhausted |
Root cause and remediation for connection pool exhaustion |
swe_missing_module |
Fix workflow for ModuleNotFoundError: requests |
agent_tool_plan |
Ordered multi-tool plan using repo search and file reads |
tool_read |
Correct read_file tool call |
tool_grep |
Correct grep_search tool call |
tool_pytest |
Correct exec_shell_command tool call |
GGUF Files
| Quantization |
File |
| Q4_K_M |
Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q4_K_M.gguf |
| Q8_0 |
Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q8_0.gguf |
| BF16 |
Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16.gguf |
| mmproj |
Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16-mmproj.gguf |
Training Mix
Notes
| Topic |
Note |
| Benchmark style |
Local API runs with fixed prompts and deterministic decoding |
| Output handling |
Some backends split content and reasoning_content; clients should merge carefully if needed |
| Safety |
Generated code should be reviewed before execution |
| SWE-bench |
This page describes the project’s local benchmark suite, not the SWE-bench Verified leaderboard |
Links