Primal (Uncensored)
Collection
3 items • Updated
This model is a modified version of Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive grafted with the Multi-Token Prediction (MTP) module using the MTP donor from the Qwen 3.6-35B-A3B-MTP-GGUF series by Unsloth.
This modification aims to provide faster inference speeds via MTP-based speculative decoding without sacrificing the base model's original quality or capabilities.
To utilize the MTP features, you need an inference engine that supports MTP speculative decoding, such as the latest versions of llama.cpp, Unsloth Studio, or SGLang.
llama.cpp (Server CLI)
Run the server with the following arguments to enable the MTP draft module:
llama-cli -m Qwen3.6-35B-A3B-Uncensored-HauhauCS-MTP-Q8_K_P.gguf.gguf \
--mmproj mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-MTP-f16.gguf \
--jinja -c 131072 -ngl 99
Notes:
-ngl (GPU offload layers) based on your system's VRAM capacity.--spec-type draft-mtp and --spec-draft-n-max 2 (can be configured up to 6 on capable systems) enable the MTP drafting mechanism.-np > 1) or concurrent multimodal inputs (--mmproj).--jinja flag in llama.cpp to parse instructions with the correct format. If you prefer to disable the built-in thinking mode, you can pass {"enable_thinking": false} in your template configuration.Base model
Qwen/Qwen3.6-35B-A3B