Qwen3.6 · 35B-A3B · Claude 4.7 Opus Reasoning Distilled

_{EXL3 · 6.0 bpw · 27.9 GB · Mixture‑of‑Experts · 48 layers × 256 experts}

An ExLlamaV3 build of lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled at 6.0 bits per weight — near-lossless reference quality — designed for 32 GB+ cards (V100, A100, RTX 6000). See Quants for sibling repos at other bit‑widths or browse the collection.

Quants

BPW	Head bits	Calibration rows	Size	Status
3.0	8	128	15.3 GB	link
4.0	8	128	19.5 GB	link
4.5	8	250	21.6 GB	link
5.0	8	250	23.7 GB	link
6.0	8	250	27.9 GB	`this repo`

Inference

Loader	Use it for
TabbyAPI	OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients.
text‑generation‑webui	Local chat UI. Pick the ExLlamaV3 loader from the model dropdown.
ExLlamaV3	Direct Python API for embedding the model in your own code or pipeline.

VRAM at 6.0 bpw: weights on disk + ~2 GB context overhead. Best on 32 GB+ cards (V100, A100, RTX 6000) where there's room for long context.

Download

pip install -U huggingface_hub

hf download \
  blockblockblock/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw \
  --local-dir ./Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw

Quantization recipe _{(advanced — embedded in quantization_config.json)}

Setting	Value
Format	`EXL3`
Bits per weight	`6.0`
Head bits	`8`
Calibration rows	`250`
Codebook	`MCG`
Out‑scales	`always`
Parallel mode	`enabled` (MoE expert batching)

Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.

License & use

Use and license follow the base model. Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.

_{Quantized with BlockQuant · convention {org}/{model}-exl3-{bpw}bpw}