Qwen3.5-9B-MTP-SWE-Agent-GGUF

Hugging Face GGUF Benchmark

Agent architecture

A 9B Qwen3.5 merge tuned for SWE-agent style workflows: multi-turn tool use, debugging, structured code generation, and reasoning-heavy instruction following. The model keeps MTP draft layers for speculative decoding in llama.cpp.

Overview

Item Value
Model family Qwen3.5-9B MTP merge
Training focus SWE workflows, tool calling, concise instruction following, reasoning traces
Primary runtime llama.cpp OpenAI-compatible API
Recommended quant Qwen3.5-9B-MTP-SWE-Agent-GGUF-Q4_K_M.gguf
Size target 9B class
Typical use Agentic coding, debugging, tool planning, structured outputs

Benchmark Snapshot

Measured locally against a live llama.cpp server with temperature 0.

Metric Result
Tests 93 / 93 passed
Pass rate 100.0%
Weighted score 100.0%
Avg latency 1.41 s
Median latency 0.95 s
Avg generation speed 95.2 tok/s

Category Breakdown

Category Tests Passed Pass % Score % Avg latency Avg gen tok/s
Debug 15 15 100.0 100.0 2.54 s 94.1
Tool plan 12 12 100.0 100.0 0.60 s 96.3
Tool call 15 15 100.0 100.0 0.39 s 96.2
Code fix 15 15 100.0 100.0 2.14 s 94.5
Workflow 9 9 100.0 100.0 1.49 s 94.8
Discipline 12 12 100.0 100.0 0.58 s 96.6
Patch 6 6 100.0 100.0 3.00 s 94.3
Reasoning 9 9 100.0 100.0 1.00 s 94.6

Capability Matrix

Capability Score
Algorithm implementation 100.0%
Complexity analysis 100.0%
Concurrency debugging 100.0%
Config inspection 100.0%
Defensive None-guard 100.0%
Dependency debugging 100.0%
Exception handling 100.0%
Format compliance (no markdown) 100.0%
Git knowledge 100.0%
Incident analysis 100.0%
Incident response 100.0%
Instruction following (short reply) 100.0%
Memory profiling knowledge 100.0%
No thinking tag leak 100.0%
Off-by-one fix 100.0%
PR workflow knowledge 100.0%
Patch generation 100.0%
Patch generation (docstring) 100.0%
Python knowledge 100.0%
Refactor planning 100.0%
Root-cause analysis 100.0%
Security fix (SQL injection) 100.0%
Security review knowledge 100.0%
Test execution planning 100.0%
Token limit following 100.0%
Tool call – exec_shell_command 100.0%
Tool call – grep_search 100.0%
Tool call – list_directory 100.0%
Tool call – read_file 100.0%
Tool call – write_file 100.0%
Tool-use planning 100.0%

What the Benchmark Covers

Area Examples
Debugging NoneType errors, connection pools, missing dependencies, race conditions, memory leaks
Tool planning grep_search, read_file, write_file, exec_shell_command, list_directory
Tool calls Structured OpenAI-style function calls with argument validation
Code repair Python bug fixes, guards, binary search, SQL injection mitigation, exception wrapping
Workflow PR checklists, incident response, code review checklists
Discipline Exact replies, no fake turns, no markdown, token-limit compliance
Patch literacy Unified diff generation and docstring edits
Reasoning Complexity analysis and conflict resolution

Representative SWE / Agentic Cases

ID What it validates
swe_debug_plan Numbered debug plan for a NoneType.get error on auth.py:42
swe_pool_exhausted Root cause and remediation for connection pool exhaustion
swe_missing_module Fix workflow for ModuleNotFoundError: requests
agent_tool_plan Ordered multi-tool plan using repo search and file reads
tool_read Correct read_file tool call
tool_grep Correct grep_search tool call
tool_pytest Correct exec_shell_command tool call

GGUF Files

Quantization File
Q4_K_M Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q4_K_M.gguf
Q8_0 Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-Q8_0.gguf
BF16 Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16.gguf
mmproj Qwen3.5-9B-MTP-SWE-Agentic-Reasoning-GGUF-BF16-mmproj.gguf

Training Mix

Dataset Weight Purpose
nebius/SWE-agent-trajectories 35% Real SWE agent traces
vsamuel/verbosity-control-training 22% Conciseness control
teknium/OpenHermes-2.5 20% Instruction quality
Jackrong/Claude-opus-4.7-TraceInversion-5000x 8% Reasoning traces
Jackrong/Claude-opus-4.6-TraceInversion-9000x 7% Reasoning traces

Notes

Topic Note
Benchmark style Local API runs with fixed prompts and deterministic decoding
Output handling Some backends split content and reasoning_content; clients should merge carefully if needed
Safety Generated code should be reviewed before execution
SWE-bench This page describes the project’s local benchmark suite, not the SWE-bench Verified leaderboard

Links

Downloads last month
902
GGUF
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF

Quantized
(20)
this model

Datasets used to train raicoon2k/Qwen3.5-9B-MTP-SWE-Agent-GGUF