Qwen3.6 · 35B-A3B · Claude 4.7 Opus Reasoning Distilled

EXL3  ·  6.0 bpw  ·  27.9 GB  ·  Mixture‑of‑Experts  ·  48 layers × 256 experts


format bpw size arch

base model quantized by collection


An ExLlamaV3 build of lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled at 6.0 bits per weight — near-lossless reference quality — designed for 32 GB+ cards (V100, A100, RTX 6000). See Quants for sibling repos at other bit‑widths or browse the collection.

Quants

BPW     Head bits     Calibration rows     Size     Status
3.0 8 128 15.3 GB link
4.0 8 128 19.5 GB link
4.5 8 250 21.6 GB link
5.0 8 250 23.7 GB link
6.0 8 250 27.9 GB this repo

Inference

Loader Use it for
TabbyAPI OpenAI‑compatible HTTP server. Drop‑in for OpenAI clients.
text‑generation‑webui Local chat UI. Pick the ExLlamaV3 loader from the model dropdown.
ExLlamaV3 Direct Python API for embedding the model in your own code or pipeline.

VRAM at 6.0 bpw: weights on disk + ~2 GB context overhead. Best on 32 GB+ cards (V100, A100, RTX 6000) where there's room for long context.

Download

pip install -U huggingface_hub

hf download \
  blockblockblock/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw \
  --local-dir ./Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw
Quantization recipe  (advanced — embedded in quantization_config.json)
Setting Value
Format EXL3
Bits per weight 6.0
Head bits 8
Calibration rows 250
Codebook MCG
Out‑scales always
Parallel mode enabled (MoE expert batching)

Loaded automatically by every ExLlamaV3 loader; reproduced here for searchability.

License & use

Use and license follow the base model. Quantization adds no additional restrictions. Refer to the upstream repository for terms, citation, and safety documentation.


Quantized with BlockQuant  ·  convention {org}/{model}-exl3-{bpw}bpw
Downloads last month
101
Safetensors
Model size
14B params
Tensor type
F16
·
I16
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blockblockblock/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw

Collection including blockblockblock/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled-exl3-6.0bpw