Instructions to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN", filename="CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN # Run inference directly in the terminal: llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN # Run inference directly in the terminal: llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN # Run inference directly in the terminal: ./llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN # Run inference directly in the terminal: ./build/bin/llama-cli -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Use Docker
docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
- LM Studio
- Jan
- vLLM
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
- Ollama
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Ollama:
ollama run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
- Unsloth Studio
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN to start chatting
- Pi
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Docker Model Runner:
docker model run hf.co/Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
- Lemonade
How to use Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Run and chat with the model
lemonade run user.CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN-{{QUANT_TAG}}List all available models
lemonade list
CHADROCK3.6 35B Uncensored Strix Lean MTP
CHADROCK3.6 35B Uncensored Strix Lean MTP is a ROCmFP4/MTP GGUF for AMD Ryzen AI Max+ 395 / Strix Halo systems.
The behavior comes from HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive, based on Qwen/Qwen3.6-35B-A3B. This release turns that model into a Strix Lean ROCmFP4 GGUF with Qwen3.6 MTP speculative decoding enabled for high-throughput local serving.
This GGUF will not run correctly with stock llama.cpp. You need the custom charlie12345/rocmfp4-llama build because this file uses ROCmFP4 tensor types and MTP runtime paths that upstream llama.cpp does not currently understand.
The model file is provided here. You do not need to rebuild or quantize the model.
This is an uncensored local-assistant build. It is intended for users who explicitly want that behavior on their own hardware.
Why This Build
This build is for Strix Halo owners who want the uncensored HauhauCS Qwen3.6 35B-A3B behavior, but with the local serving speed and coding strength that CHADROCK/ROCmFP4 and MTP can unlock on AMD unified-memory hardware.
The mix is:
- HauhauCS uncensored/aggressive Qwen3.6 35B-A3B behavior
- Qwen3.6 35B-A3B MoE efficiency, with roughly 3B active parameters per token
- Qwen3.6 MTP speculative decoding
- ROCmFP4 STRIX_LEAN GGUF conversion
- Strix Halo tuned f16/f16 KV,
b2048/u512,Vulkan0, one-slot serving - 262k context public profile with MTP enabled
- a
157/164HumanEval base result with fast HumanEval generation
Technical Metadata
Hugging Face may round the parsed GGUF tensor count to 36B in its automatic badge. This release is the Qwen3.6 35B-A3B MoE family: about 35B-class total parameters with roughly 3B active parameters per token.
| Field | Value |
|---|---|
| model family | Qwen3.6 35B-A3B |
| architecture | qwen35moe |
| active parameters | ~3B class |
| direct source | HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive |
| base family | Qwen/Qwen3.6-35B-A3B |
| runtime format | ROCmFP4 STRIX_LEAN GGUF |
| target hardware | AMD Ryzen AI Max+ 395 / Strix Halo |
| backend device | Vulkan0 |
| context | 262144 |
| max tokens | 65536 |
| serving slots | 1 |
| batch / ubatch | 2048 / 512 |
| target KV | f16 / f16 |
| draft KV | f16 / f16 |
| MTP draft depth | --spec-draft-n-max 4 |
| vision | text-only profile, --no-mmproj |
Model Tree
Qwen/Qwen3.6-35B-A3B
-> HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive
-> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4
-> CHADROCK3.6 35B Uncensored Strix Lean ROCmFP4 MTP
Headline Benchmarks
All local numbers below were measured on AMD Ryzen AI Max+ 395 / Strix Halo with the public MTP profile.
HumanEval
| Model / row | HumanEval base | HumanEval+ |
|---|---|---|
| CHADROCK3.6 35B Uncensored Strix Lean MTP | 157/164 = 95.73% |
150/164 = 91.46% |
This is a strong HumanEval result for a local uncensored ROCmFP4/MTP GGUF run.
HumanEval Speed
| Metric | CHADROCK3.6 35B Uncensored MTP |
|---|---|
| HumanEval tasks | 164 |
| total tokens processed | 75,223 |
| completion tokens generated | 46,360 |
| codegen wall time | 488.0s |
| cumulative request latency | 484.95s |
| mean request latency | 2.96s |
| total-token throughput, prompt + completion | 154.15 tok/s |
| completion-token generation throughput | 95.60 tok/s |
| median per-request completion-token speed | 95.21 tok/s |
The total-token number counts prompt plus completion tokens over the full codegen wall time. The completion-token number counts generated completion tokens over request latency. The same EvalPlus HumanEval run produced the score table above and generated the full 164-task workload in about eight minutes of codegen wall time.
Run With llama-server
Build Charlie's custom llama.cpp once, download this GGUF, then run:
/path/to/rocmfp4-llama/build-strix-rocmfp4/bin/llama-server \
-m CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf \
--alias CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN \
--host 127.0.0.1 \
--port 8080 \
--jinja \
-c 262144 \
--reasoning off \
--reasoning-format none \
--reasoning-budget 0 \
--no-context-shift \
-sm row \
-ngl 999 \
-fa on \
-b 2048 \
-ub 512 \
-dev Vulkan0 \
-t 16 \
-tb 32 \
-ctk f16 \
-ctv f16 \
--temp 0.2 \
--min-p 0.0 \
--top-p 0.9 \
--top-k 20 \
--repeat-penalty 1.0 \
--seed 123 \
--parallel 1 \
--no-mmproj \
--metrics \
--cache-ram 0 \
--spec-type draft-mtp \
--spec-draft-device Vulkan0 \
--spec-draft-ngl all \
--spec-draft-threads 16 \
--spec-draft-threads-batch 32 \
--spec-draft-type-k f16 \
--spec-draft-type-v f16 \
--spec-draft-n-max 4 \
--spec-draft-n-min 0 \
--spec-draft-p-min 0.0 \
--poll 100 \
--poll-batch 1 \
--spec-draft-poll 1 \
--spec-draft-poll-batch 1
Use --parallel 1 for this MTP profile. One slot is part of the intended MTP serving setup.
Text Only
This release is served as text-only. The public Strix Lean profile uses --no-mmproj.
The upstream HauhauCS repo includes multimodal metadata and a matching projector exists locally, but the June 4, 2026 Ciru real-image gate failed for this model with MTP on and with MTP off. The clean non-MTP Hauhau ROCmFP4 path and the original Hauhau Q8 path also failed that gate. Because of that, this release does not advertise or recommend vision use.
Build The Required llama.cpp
The GGUF is already provided. You only need to build the custom llama.cpp server once:
git clone https://github.com/charlie12345/rocmfp4-llama.git
cd rocmfp4-llama
git checkout mtp-rocmfp4-strix
env JOBS=16 scripts/build-strix-rocmfp4-mtp.sh
The server binary will be here:
build-strix-rocmfp4/bin/llama-server
Charlie12345, also known as @Italianclownz, added the ROCmFP4 llama.cpp path this GGUF needs. The method adds custom ROCmFP4 GGUF tensor types and AMD-focused backend support so Strix Halo systems can run these very compact high-throughput builds.
File
| File | Size | SHA256 |
|---|---|---|
CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN.gguf |
18G |
32f40ebf853ee081b1e33b0104b384654266037fbf61d6ed07bece2a0560b238 |
Credits
HauhauCS: uncensored/aggressive Qwen3.6 35B-A3B source model.Qwen: baseQwen3.6-35B-A3Bmodel family.charlie12345/ @Italianclownz: ROCmFP4 llama.cpp fork and AMD-focused MTP runtime path.
Notes
This is an experimental AMD ROCmFP4/MTP build. Performance depends on driver version, clocks, prompt shape, MTP acceptance, and serving flags. The numbers above are local reproducible measurements on Strix Halo, not universal llama.cpp claims.
- Downloads last month
- 210
We're not able to determine the quantization variants.
Model tree for Vegss/CHADROCK3.6-35B-UNCENSORED-MTP-STRIX-LEAN
Base model
Qwen/Qwen3.6-35B-A3B