---
license: apache-2.0
base_model: huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated
base_model_relation: quantized
quantized_by: blockblockblock
library_name: exllamav3
pipeline_tag: text-generation
tags:
- exl3
- exllamav3
- quantized
- mixture-of-experts
- qwen
quantization_format: exl3
bits_per_weight: 4.5
---
# Huihui · Qwen3.6 · 35B-A3B · abliterated
EXL3 · 4.5 bpw · 21.6 GB · Mixture‑of‑Experts · 40 layers × 256 experts
[](https://github.com/turboderp-org/exllamav3)
[](#quants)
[](#quants)
[](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated)
[](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated)
[](https://huggingface.co/blockblockblock)
[](https://huggingface.co/collections/blockblockblock/huihui-qwen36-35b-a3b-abliterated-exl3-69ef8197714f5baf5d4b802d)
---
> [!NOTE]
> An [ExLlamaV3](https://github.com/turboderp-org/exllamav3) build of [`huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated`](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated) at **4.5 bits per weight** — quality-leaning build — comfortable on a 24 GB card with usable context (~22 GB weights). See [Quants](#quants) for sibling repos at other bit‑widths or browse the [collection](https://huggingface.co/collections/blockblockblock/huihui-qwen36-35b-a3b-abliterated-exl3-69ef8197714f5baf5d4b802d).
## Quants
| BPW | Head bits | Calibration rows | Size | Status |
| :---: | :---: | :---: | ---: | :--- |
| 4.0 | 8 | 128 | 19.5 GB | [link](https://huggingface.co/blockblockblock/Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.0bpw) |
| **4.5** | 8 | 250 | **21.6 GB** | this repo |
## Inference
| Loader |
Use it for |
| TabbyAPI |
OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients. |
| text‑generation‑webui |
Local chat UI. Pick the ExLlamaV3 loader from the model dropdown. |
| ExLlamaV3 |
Direct Python API for embedding the model in your own code or pipeline. |
> [!TIP]
> **VRAM at 4.5 bpw:** ~21.6 GB weights + ~2 GB context. Comfortable on a single 24 GB card; tight on 16 GB with reduced context.
## Download
```bash
pip install -U huggingface_hub
hf download \
blockblockblock/Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw \
--local-dir ./Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw
```
Quantization recipe (advanced — embedded in quantization_config.json)
| Setting | Value |
| :--- | :--- |
| Format | `EXL3` |
| Bits per weight | `4.5` |
| Head bits | `8` |
| Calibration rows | `250` |
| Codebook | `MCG` |
| Out‑scales | `always` |
| Parallel mode | `enabled` (MoE expert batching) |
Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.
## License & use
> [!IMPORTANT]
> Use and license **follow the [base model](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated)**.
> Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.
---
Quantized with BlockQuant · convention {org}/{model}-exl3-{bpw}bpw