Qwen3.6 · 35B-A3B · Claude 4.7 Opus Reasoning Distilled EXL3
Collection
ExLlamav3 quantizations of lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled • 5 items • Updated
EXL3 · 6.0 bpw · 27.9 GB · Mixture‑of‑Experts · 48 layers × 256 experts
An ExLlamaV3 build of
lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilledat 6.0 bits per weight — near-lossless reference quality — designed for 32 GB+ cards (V100, A100, RTX 6000). See Quants for sibling repos at other bit‑widths or browse the collection.
| Loader | Use it for |
|---|---|
| TabbyAPI | OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients. |
| text‑generation‑webui | Local chat UI. Pick the ExLlamaV3 loader from the model dropdown. |
| ExLlamaV3 | Direct Python API for embedding the model in your own code or pipeline. |
VRAM at 6.0 bpw: weights on disk + ~2 GB context overhead. Best on 32 GB+ cards (V100, A100, RTX 6000) where there's room for long context.
pip install -U huggingface_hub
hf download \
blockblockblock/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw \
--local-dir ./Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw
quantization_config.json)| Setting | Value |
|---|---|
| Format | EXL3 |
| Bits per weight | 6.0 |
| Head bits | 8 |
| Calibration rows | 250 |
| Codebook | MCG |
| Out‑scales | always |
| Parallel mode | enabled (MoE expert batching) |
Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.
Use and license follow the base model. Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.
{org}/{model}-exl3-{bpw}bpw
Base model
Qwen/Qwen3.6-35B-A3B