THIS MODEL IS MEANT TO BE A MEME

Qwen3.6-35B-A3B-REAP-90pct-GGUF

GGUF builds of DJLougen/Qwen3.6-35B-A3B-REAP-90pct — a REAP keep-26 (90% routed-expert) prune of Qwen/Qwen3.6-35B-A3B, 6.15B params.

For the hardware-strapped: a "35B" that fits on a GPU you found in a drawer. It pruned 230 of every 256 experts to get there, so it runs on almost nothing and answers with roughly the same amount of conviction. Free to download, free of most of the original parameters, and largely free of meaning.

Files

File Size bpw Notes
Qwen3.6-35B-A3B-REAP-90pct-Q4_K_M.gguf 3.4 GB 4.99 the sensible one
Qwen3.6-35B-A3B-REAP-90pct-IQ1_S.gguf 1.6 GB ~2.25 eff the "I have 4 GB of VRAM and a dream" one (imatrix-quantized)

IQ1_S is the smallest valid GGUF for this model — llama.cpp keeps the 248k-token embedding and the untied lm_head at ≥2-bit (forcing them to 1-bit aborts), so 1.6 GB is the floor. Below that you have to leave the format (see the 0.78 GB sign-packed 1-bit checkpoint).

Run

llama-cli -m Qwen3.6-35B-A3B-REAP-90pct-Q4_K_M.gguf -ngl 99 -p "Hello"
# or serve it
llama-server -m Qwen3.6-35B-A3B-REAP-90pct-IQ1_S.gguf -ngl 99 -c 8192

Requires a llama.cpp build with qwen3_5_moe / QWEN35MOE support (hybrid linear-attention

  • fused-expert MoE arch).

Honest note

This is a 90%-expert-pruned model. It loads, it streams tokens, and the tokens are mostly incoherent — more so at IQ1_S than Q4_K_M. It is a compression demonstration, not a model you should rely on for anything.

Downloads last month
2,495
GGUF
Model size
6B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

1-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Qwen3.6-35B-A3B-REAP-90pct-GGUF

Quantized
(1)
this model