CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Instructions to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN",
	filename="CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
# Run inference directly in the terminal:
llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
# Run inference directly in the terminal:
llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
# Run inference directly in the terminal:
./llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Use Docker

docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

LM Studio
Jan

vLLM

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Ollama
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Ollama:
```
ollama run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
```

Unsloth Studio

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Docker Model Runner:
```
docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
```

Lemonade

How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Run and chat with the model

lemonade run user.CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN-{{QUANT_TAG}}

List all available models

lemonade list

CHADROCK3.6 35B Uncensored Strix Lean MTP

CHADROCK3.6 35B Uncensored Strix Lean MTP is a ROCmFP4/MTP GGUF for AMD Ryzen AI Max+ 395 / Strix Halo systems.

The behavior comes from HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive, based on Qwen/Qwen3.6-35B-A3B. This release turns that model into a Strix Lean ROCmFP4 GGUF with Qwen3.6 MTP speculative decoding enabled for high-throughput local serving.

This GGUF will not run correctly with stock llama.cpp. You need the custom charlie12345/rocmfp4-llama build because this file uses ROCmFP4 tensor types and MTP runtime paths that upstream llama.cpp does not currently understand.

The model file is provided here. You do not need to rebuild or quantize the model.

This is an uncensored local-assistant build. It is intended for users who explicitly want that behavior on their own hardware.

Why This Build

This build is for Strix Halo owners who want the uncensored HauhauCS Qwen3.6 35B-A3B behavior, but with the local serving speed and coding strength that CHADROCK/ROCmFP4 and MTP can unlock on AMD unified-memory hardware.

The mix is:

HauhauCS uncensored/aggressive Qwen3.6 35B-A3B behavior
Qwen3.6 35B-A3B MoE efficiency, with roughly 3B active parameters per token
Qwen3.6 MTP speculative decoding
ROCmFP4 STRIX_LEAN GGUF conversion
Strix Halo tuned f16/f16 KV, b2048/u512, Vulkan0, one-slot serving
262k context public profile with MTP enabled
a 157/164 HumanEval base result with fast HumanEval generation

Technical Metadata

Hugging Face may round the parsed GGUF tensor count to 36B in its automatic badge. This release is the Qwen3.6 35B-A3B MoE family: about 35B-class total parameters with roughly 3B active parameters per token.

Field	Value
model family	`Qwen3.6 35B-A3B`
architecture	`qwen35moe`
active parameters	`~3B` class
direct source	`HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive`
base family	`Qwen/Qwen3.6-35B-A3B`
runtime format	ROCmFP4 `STRIX_LEAN` GGUF
target hardware	AMD Ryzen AI Max+ 395 / Strix Halo
backend device	`Vulkan0`
context	`262144`
max tokens	`65536`
serving slots	`1`
batch / ubatch	`2048 / 512`
target KV	`f16 / f16`
draft KV	`f16 / f16`
MTP draft depth	`--spec-draft-n-max 4`
vision	text-only profile, `--no-mmproj`

Model Tree

Qwen/Qwen3.6-35B-A3B
  -> HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
    -> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4
      -> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4 MTP

Headline Benchmarks

All local numbers below were measured on AMD Ryzen AI Max+ 395 / Strix Halo with the public MTP profile.

HumanEval

Model / row	HumanEval base	HumanEval+
CHADROCK3.6 35B Uncensored Strix Lean MTP	`157/164 = 95.73%`	`150/164 = 91.46%`

This is a strong HumanEval result for a local uncensored ROCmFP4/MTP GGUF run.

HumanEval Speed

Metric	CHADROCK3.6 35B Uncensored MTP
HumanEval tasks	`164`
total tokens processed	`75,223`
completion tokens generated	`46,360`
codegen wall time	`488.0s`
cumulative request latency	`484.95s`
mean request latency	`2.96s`
total-token throughput, prompt + completion	`154.15 tok/s`
completion-token generation throughput	`95.60 tok/s`
median per-request completion-token speed	`95.21 tok/s`

The total-token number counts prompt plus completion tokens over the full codegen wall time. The completion-token number counts generated completion tokens over request latency. The same EvalPlus HumanEval run produced the score table above and generated the full 164-task workload in about eight minutes of codegen wall time.

Run With llama-server

Build Charlie's custom llama.cpp once, download this GGUF, then run:

/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
  -m CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf \
  --alias CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN \
  --host 127.0.0.1 \
  --port 8080 \
  --jinja \
  -c 262144 \
  --reasoning off \
  --reasoning-format none \
  --reasoning-budget 0 \
  --no-context-shift \
  -sm row \
  -ngl 999 \
  -fa on \
  -b 2048 \
  -ub 512 \
  -dev Vulkan0 \
  -t 16 \
  -tb 32 \
  -ctk f16 \
  -ctv f16 \
  --temp 0.2 \
  --min-p 0.0 \
  --top-p 0.9 \
  --top-k 20 \
  --repeat-penalty 1.0 \
  --seed 123 \
  --parallel 1 \
  --no-mmproj \
  --metrics \
  --cache-ram 0 \
  --spec-type draft-mtp \
  --spec-draft-device Vulkan0 \
  --spec-draft-ngl all \
  --spec-draft-threads 16 \
  --spec-draft-threads-batch 32 \
  --spec-draft-type-k f16 \
  --spec-draft-type-v f16 \
  --spec-draft-n-max 4 \
  --spec-draft-n-min 0 \
  --spec-draft-p-min 0.0 \
  --poll 100 \
  --poll-batch 1 \
  --spec-draft-poll 1 \
  --spec-draft-poll-batch 1

Use --parallel 1 for this MTP profile. One slot is part of the intended MTP serving setup.

Text Only

This release is served as text-only. The public Strix Lean profile uses --no-mmproj.

The upstream HauhauCS repo includes multimodal metadata and a matching projector exists locally, but the June 4, 2026 Ciru real-image gate failed for this model with MTP on and with MTP off. The clean non-MTP Hauhau ROCmFP4 path and the original Hauhau Q8 path also failed that gate. Because of that, this release does not advertise or recommend vision use.

Build The Required llama.cpp

The GGUF is already provided. You only need to build the custom llama.cpp server once:

git clone https://github.com/charlie12345/rocmfp4-llama.git
cd rocmfp4-llama
git checkout mtp-rocmfp4-strix
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh

The server binary will be here:

build-strix-rocmfp4/bin/llama-server

Charlie12345, also known as @Italianclownz, added the ROCmFP4 llama.cpp path this GGUF needs. The method adds custom ROCmFP4 GGUF tensor types and AMD-focused backend support so Strix Halo systems can run these very compact high-throughput builds.

File

File	Size	SHA256
`CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf`	`18G`	`32f40ebf853ee081b1e33b0104b384654266037fbf61d6ed07bece2a0560b238`

Credits

HauhauCS: uncensored/aggressive Qwen3.6 35B-A3B source model.
Qwen: base Qwen3.6-35B-A3B model family.
charlie12345 / @Italianclownz: ROCmFP4 llama.cpp fork and AMD-focused MTP runtime path.

Notes

This is an experimental AMD ROCmFP4/MTP build. Performance depends on driver version, clocks, prompt shape, MTP acceptance, and serving flags. The numbers above are local reproducible measurements on Strix Halo, not universal llama.cpp claims.

Downloads last month: 210

GGUF

Model size

36B params

Architecture

qwen35moe

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN

Base model

Qwen/Qwen3.6-35B-A3B

Quantized

HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

Quantized

(9)

this model