CHADROCK3.6 35B Uncensored Strix Lean MTP

CHADROCK3.6 35B Uncensored Strix Lean MTP

CHADROCK3.6 35B Uncensored Strix Lean MTP is a ROCmFP4/MTP GGUF for AMD Ryzen AI Max+ 395 / Strix Halo systems.

The behavior comes from HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive, based on Qwen/Qwen3.6-35B-A3B. This release turns that model into a Strix Lean ROCmFP4 GGUF with Qwen3.6 MTP speculative decoding enabled for high-throughput local serving.

This GGUF will not run correctly with stock llama.cpp. You need the custom charlie12345/rocmfp4-llama build because this file uses ROCmFP4 tensor types and MTP runtime paths that upstream llama.cpp does not currently understand.

The model file is provided here. You do not need to rebuild or quantize the model.

This is an uncensored local-assistant build. It is intended for users who explicitly want that behavior on their own hardware.

Why This Build

This build is for Strix Halo owners who want the uncensored HauhauCS Qwen3.6 35B-A3B behavior, but with the local serving speed and coding strength that CHADROCK/ROCmFP4 and MTP can unlock on AMD unified-memory hardware.

The mix is:

  • HauhauCS uncensored/aggressive Qwen3.6 35B-A3B behavior
  • Qwen3.6 35B-A3B MoE efficiency, with roughly 3B active parameters per token
  • Qwen3.6 MTP speculative decoding
  • ROCmFP4 STRIX_LEAN GGUF conversion
  • Strix Halo tuned f16/f16 KV, b2048/u512, Vulkan0, one-slot serving
  • 262k context public profile with MTP enabled
  • a 157/164 HumanEval base result with fast HumanEval generation

Technical Metadata

Hugging Face may round the parsed GGUF tensor count to 36B in its automatic badge. This release is the Qwen3.6 35B-A3B MoE family: about 35B-class total parameters with roughly 3B active parameters per token.

Field Value
model family Qwen3.6 35B-A3B
architecture qwen35moe
active parameters ~3B class
direct source HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
base family Qwen/Qwen3.6-35B-A3B
runtime format ROCmFP4 STRIX_LEAN GGUF
target hardware AMD Ryzen AI Max+ 395 / Strix Halo
backend device Vulkan0
context 262144
max tokens 65536
serving slots 1
batch / ubatch 2048 / 512
target KV f16 / f16
draft KV f16 / f16
MTP draft depth --spec-draft-n-max 4
vision text-only profile, --no-mmproj

Model Tree

Qwen/Qwen3.6-35B-A3B
  -> HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
    -> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4
      -> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4 MTP

Headline Benchmarks

All local numbers below were measured on AMD Ryzen AI Max+ 395 / Strix Halo with the public MTP profile.

HumanEval

Model / row HumanEval base HumanEval+
CHADROCK3.6 35B Uncensored Strix Lean MTP 157/164 = 95.73% 150/164 = 91.46%

This is a strong HumanEval result for a local uncensored ROCmFP4/MTP GGUF run.

HumanEval Speed

Metric CHADROCK3.6 35B Uncensored MTP
HumanEval tasks 164
total tokens processed 75,223
completion tokens generated 46,360
codegen wall time 488.0s
cumulative request latency 484.95s
mean request latency 2.96s
total-token throughput, prompt + completion 154.15 tok/s
completion-token generation throughput 95.60 tok/s
median per-request completion-token speed 95.21 tok/s

The total-token number counts prompt plus completion tokens over the full codegen wall time. The completion-token number counts generated completion tokens over request latency. The same EvalPlus HumanEval run produced the score table above and generated the full 164-task workload in about eight minutes of codegen wall time.

Run With llama-server

Build Charlie's custom llama.cpp once, download this GGUF, then run:

/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
  -m CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf \
  --alias CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN \
  --host 127.0.0.1 \
  --port 8080 \
  --jinja \
  -c 262144 \
  --reasoning off \
  --reasoning-format none \
  --reasoning-budget 0 \
  --no-context-shift \
  -sm row \
  -ngl 999 \
  -fa on \
  -b 2048 \
  -ub 512 \
  -dev Vulkan0 \
  -t 16 \
  -tb 32 \
  -ctk f16 \
  -ctv f16 \
  --temp 0.2 \
  --min-p 0.0 \
  --top-p 0.9 \
  --top-k 20 \
  --repeat-penalty 1.0 \
  --seed 123 \
  --parallel 1 \
  --no-mmproj \
  --metrics \
  --cache-ram 0 \
  --spec-type draft-mtp \
  --spec-draft-device Vulkan0 \
  --spec-draft-ngl all \
  --spec-draft-threads 16 \
  --spec-draft-threads-batch 32 \
  --spec-draft-type-k f16 \
  --spec-draft-type-v f16 \
  --spec-draft-n-max 4 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --poll 100 \
  --poll-batch 1 \
  --spec-draft-poll 1 \
  --spec-draft-poll-batch 1

Use --parallel 1 for this MTP profile. One slot is part of the intended MTP serving setup.

Text Only

This release is served as text-only. The public Strix Lean profile uses --no-mmproj.

The upstream HauhauCS repo includes multimodal metadata and a matching projector exists locally, but the June 4, 2026 Ciru real-image gate failed for this model with MTP on and with MTP off. The clean non-MTP Hauhau ROCmFP4 path and the original Hauhau Q8 path also failed that gate. Because of that, this release does not advertise or recommend vision use.

Build The Required llama.cpp

The GGUF is already provided. You only need to build the custom llama.cpp server once:

git clone https://github.com/charlie12345/rocmfp4-llama.git
cd rocmfp4-llama
git checkout mtp-rocmfp4-strix
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh

The server binary will be here:

build-strix-rocmfp4/bin/llama-server

Charlie12345, also known as @Italianclownz, added the ROCmFP4 llama.cpp path this GGUF needs. The method adds custom ROCmFP4 GGUF tensor types and AMD-focused backend support so Strix Halo systems can run these very compact high-throughput builds.

File

File Size SHA256
CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf 18G 32f40ebf853ee081b1e33b0104b384654266037fbf61d6ed07bece2a0560b238

Credits

  • HauhauCS: uncensored/aggressive Qwen3.6 35B-A3B source model.
  • Qwen: base Qwen3.6-35B-A3B model family.
  • charlie12345 / @Italianclownz: ROCmFP4 llama.cpp fork and AMD-focused MTP runtime path.

Notes

This is an experimental AMD ROCmFP4/MTP build. Performance depends on driver version, clocks, prompt shape, MTP acceptance, and serving flags. The numbers above are local reproducible measurements on Strix Halo, not universal llama.cpp claims.

Downloads last month
210
GGUF
Model size
36B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN