DBMe/Qwen3.5-4B-heretic-exl3

EXL3 (ExLlamaV3) quantizations of coder3101/Qwen3.5-4B-heretic. All credit for the original model goes to the original authors.

📊 Available Quantizations & VRAM

The model weights are stored in separate branches. Please switch to a branch to download. Note: VRAM estimates include PyTorch context overhead (~0.8GB) and assume an unquantized FP16 KV cache.

Target BPW	Head BPW	Branch (Download Link)	WikiText-2 PPL (512 ctx)¹	2K ctx	4K ctx	8K ctx	16K ctx	32K ctx
4.0	h6	4.0bpw_h6	10.2665	~4.83 GB	~4.9 GB	~5.02 GB	~5.27 GB	~5.77 GB
5.0	h6	5.0bpw_h6	10.1381	~5.25 GB	~5.31 GB	~5.44 GB	~5.69 GB	~6.19 GB
6.0	h6	6.0bpw_h6	10.1020	~5.66 GB	~5.73 GB	~5.85 GB	~6.1 GB	~6.6 GB
8.0	h8	8.0bpw_h8	10.1099	~6.64 GB	~6.7 GB	~6.83 GB	~7.08 GB	~7.58 GB

¹ Evaluated against WikiText-2 with ExLlamaV3 using a strided 512-token context window (-c 512) in llama.cpp parity mode (-g). Lower is better. (Higher BPW = higher quality, lower BPW = fits in less VRAM).

📥 How to Download

It's recommended to use the huggingface-cli to download specific branches. (Do not use git clone as it will download all branches!)

Ensure you have the CLI installed:

pip install -U "huggingface_hub[cli]"

Download a specific branch (e.g., 4.0bpw_h6):

# Example: Downloading the 4.0bpw_h6 branch
huggingface-cli download DBMe/Qwen3.5-4B-heretic-exl3 --revision 4.0bpw_h6 --local-dir Qwen3.5-4B-heretic-exl3-4.0bpw_h6

💻 Supported Engines

These models are highly optimized for modern GPUs and can be run using:

TabbyAPI: A fast, OpenAI-compatible API server. (Set model_name: "Qwen3.5-4B-heretic-exl3-<BranchName>" in your config)
Text-Generation-WebUI: A local web interface. (Select the exllamav3 loader)
ExLlamaV3 (Native): Python library for custom integration.

📈 Perplexity Degradation Curve

(Lower is better)

⚙️ Advanced: Quantization Environment & Settings

🔬 Quantization Settings

Codebook: mcg
Output Scales: always
Calibration Rows: 250
Calibration Cols: 2048
Calibration Dataset: ExLlamaV3 Default (Wiki/C4/Code)
High Quality (HQ) Mode: False
ExLlamaV3: 0.0.29 (Commit: cb1a436)
Hardware: NVIDIA RTX PRO 6000 Blackwell Server Edition

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for DBMe/Qwen3.5-4B-heretic-exl3

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

coder3101/Qwen3.5-4B-heretic

Quantized

(8)

this model