Huihui-Qwen3.6-35B-A3B-abliterated EXL3
Collection
EXL3 quantizations of Huihui-Qwen3.6-35B-A3B-abliterated • 2 items • Updated
EXL3 · 4.5 bpw · 21.6 GB · Mixture‑of‑Experts · 40 layers × 256 experts
An ExLlamaV3 build of
huihui-ai/Huihui-Qwen3.6-35B-A3B-abliteratedat 4.5 bits per weight — quality-leaning build — comfortable on a 24 GB card with usable context (~22 GB weights). See Quants for sibling repos at other bit‑widths or browse the collection.
| BPW | Head bits | Calibration rows | Size | Status |
|---|---|---|---|---|
| 4.0 | 8 | 128 | 19.5 GB | link |
| 4.5 | 8 | 250 | 21.6 GB | this repo |
| Loader | Use it for |
|---|---|
| TabbyAPI | OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients. |
| text‑generation‑webui | Local chat UI. Pick the ExLlamaV3 loader from the model dropdown. |
| ExLlamaV3 | Direct Python API for embedding the model in your own code or pipeline. |
VRAM at 4.5 bpw: ~21.6 GB weights + ~2 GB context. Comfortable on a single 24 GB card; tight on 16 GB with reduced context.
pip install -U huggingface_hub
hf download \
blockblockblock/Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw \
--local-dir ./Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw
quantization_config.json)| Setting | Value |
|---|---|
| Format | EXL3 |
| Bits per weight | 4.5 |
| Head bits | 8 |
| Calibration rows | 250 |
| Codebook | MCG |
| Out‑scales | always |
| Parallel mode | enabled (MoE expert batching) |
Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.
Use and license follow the base model. Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.
{org}/{model}-exl3-{bpw}bpw
Base model
Qwen/Qwen3.6-35B-A3B