--- license: apache-2.0 base_model: huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated base_model_relation: quantized quantized_by: blockblockblock library_name: exllamav3 pipeline_tag: text-generation tags: - exl3 - exllamav3 - quantized - mixture-of-experts - qwen quantization_format: exl3 bits_per_weight: 4.5 ---
# Huihui · Qwen3.6 · 35B-A3B · abliterated EXL3  ·  4.5 bpw  ·  21.6 GB  ·  Mixture‑of‑Experts  ·  40 layers × 256 experts
[![format](https://img.shields.io/badge/format-EXL3-c63010?style=for-the-badge&labelColor=14120e)](https://github.com/turboderp-org/exllamav3) [![bpw](https://img.shields.io/badge/bpw-4.5-6b8a76?style=for-the-badge&labelColor=14120e)](#quants) [![size](https://img.shields.io/badge/size-21.6_GB-6b8a76?style=for-the-badge&labelColor=14120e)](#quants) [![arch](https://img.shields.io/badge/arch-MoE_36B--A3B-c63010?style=for-the-badge&labelColor=14120e)](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated) [![base model](https://img.shields.io/badge/Base-huihui--ai%2FHuihui--Qwen3.6--35B--A3B--abliterated-2a2620?style=flat-square&logo=huggingface&logoColor=white)](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated) [![quantized by](https://img.shields.io/badge/Quantized_by-blockblockblock-2a2620?style=flat-square&logo=huggingface&logoColor=white)](https://huggingface.co/blockblockblock) [![collection](https://img.shields.io/badge/All_bpws-Collection-c63010?style=flat-square&logo=huggingface&logoColor=white)](https://huggingface.co/collections/blockblockblock/huihui-qwen36-35b-a3b-abliterated-exl3-69ef8197714f5baf5d4b802d)
--- > [!NOTE] > An [ExLlamaV3](https://github.com/turboderp-org/exllamav3) build of [`huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated`](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated) at **4.5 bits per weight** — quality-leaning build — comfortable on a 24 GB card with usable context (~22 GB weights). See [Quants](#quants) for sibling repos at other bit‑widths or browse the [collection](https://huggingface.co/collections/blockblockblock/huihui-qwen36-35b-a3b-abliterated-exl3-69ef8197714f5baf5d4b802d). ## Quants
| BPW   |   Head bits   |   Calibration rows   |   Size   |   Status | | :---: | :---: | :---: | ---: | :--- | | 4.0 | 8 | 128 | 19.5 GB | [link](https://huggingface.co/blockblockblock/Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.0bpw) | | **4.5** | 8 | 250 | **21.6 GB** | this repo |
## Inference
Loader Use it for
TabbyAPI OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients.
text‑generation‑webui Local chat UI. Pick the ExLlamaV3 loader from the model dropdown.
ExLlamaV3 Direct Python API for embedding the model in your own code or pipeline.
> [!TIP] > **VRAM at 4.5 bpw:** ~21.6 GB weights + ~2 GB context. Comfortable on a single 24 GB card; tight on 16 GB with reduced context. ## Download ```bash pip install -U huggingface_hub hf download \ blockblockblock/Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw \ --local-dir ./Huihui-Qwen3.6-35B-A3B-abliterated-exl3-4.5bpw ```
Quantization recipe  (advanced — embedded in quantization_config.json)
| Setting | Value | | :--- | :--- | | Format | `EXL3` | | Bits per weight | `4.5` | | Head bits | `8` | | Calibration rows | `250` | | Codebook | `MCG` | | Out‑scales | `always` | | Parallel mode | `enabled` (MoE expert batching) | Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.
## License & use > [!IMPORTANT] > Use and license **follow the [base model](https://huggingface.co/huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated)**. > Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation. ---
Quantized with BlockQuant  ·  convention {org}/{model}-exl3-{bpw}bpw