Instructions to use keithnull/Qwen3.6-35B-A3B-REAM-192-heretic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use keithnull/Qwen3.6-35B-A3B-REAM-192-heretic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="keithnull/Qwen3.6-35B-A3B-REAM-192-heretic")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("keithnull/Qwen3.6-35B-A3B-REAM-192-heretic")
model = AutoModelForImageTextToText.from_pretrained("keithnull/Qwen3.6-35B-A3B-REAM-192-heretic")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use keithnull/Qwen3.6-35B-A3B-REAM-192-heretic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/keithnull/Qwen3.6-35B-A3B-REAM-192-heretic

SGLang

How to use keithnull/Qwen3.6-35B-A3B-REAM-192-heretic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "keithnull/Qwen3.6-35B-A3B-REAM-192-heretic",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use keithnull/Qwen3.6-35B-A3B-REAM-192-heretic with Docker Model Runner:
```
docker model run hf.co/keithnull/Qwen3.6-35B-A3B-REAM-192-heretic
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Qwen3.6-35B-A3B-REAM-192-heretic

A decensored variant of keithnull/Qwen3.6-35B-A3B-REAM-192 (the REAM-merged 27.05B-parameter Qwen3.6-VL with 192 routed experts), produced with Heretic v1.3.0 using a Magnitude-Preserving Orthogonal Ablation (MPOA) variant — Heretic's row_normalization = FULL mode.

Method

Built with the patched fork goblincore/heretic@qwen3_5-packed-experts-v2, which adds Qwen3.5/3.6-specific MoE coverage on top of upstream Heretic:

Tier 1 — mlp.shared_expert.down_proj selector branch (a layer of dense MLP capacity that stock Heretic's selector misses on hybrid Qwen3.5/3.6 MoE blocks).
Tier 2 — per-trial direct-tensor ablation of the packed routed experts. Qwen3.5/3.6 stores 192 experts per layer as a single packed nn.Parameter of shape [num_experts, hidden, intermediate] that PEFT/LoRA cannot wrap; the fork applies the same rank-1 directional ablation in-place to each expert slice and snapshots/restores between Optuna trials. Approach derived from Sehyo's PR #207.
Tier 3 — separate optimization-component keys for attn.o_proj (standard self-attention) and attn.out_proj (GatedDeltaNet linear-attention) so Optuna learns independent ablation kernels for the two attention variants in Qwen3.6's hybrid 1:3 layer mix.
Auxiliary safetensors shard preservation on save (relevant for bases that ship MTP / draft heads — no-op for this REAM-192 base which has no MTP). Approach derived from timrohrbaugh's PR #317.

Abliteration parameters (Trial 67)

Parameter	Value
direction_index	17.12
attn.o_proj.max_weight	1.49
attn.o_proj.max_weight_position	29.14
attn.o_proj.min_weight	0.77
attn.o_proj.min_weight_distance	5.21
attn.out_proj.max_weight	0.95
attn.out_proj.max_weight_position	26.92
attn.out_proj.min_weight	0.78
attn.out_proj.min_weight_distance	22.12
mlp.down_proj.max_weight	1.50
mlp.down_proj.max_weight_position	26.81
mlp.down_proj.min_weight	1.11
mlp.down_proj.min_weight_distance	22.33

Targeted components

attn.o_proj — standard self-attention output projection (47 modules, one per self-attention layer)
attn.out_proj — GatedDeltaNet linear-attention output projection (separate kernel via the Tier 3 split, on the layers that use linear attention rather than standard self-attention)
mlp.down_proj — shared dense expert per layer (PEFT/LoRA-wrapped via the Tier 1 selector branch)
Fused expert parameters — 192 packed routed experts × 47 layers ≈ ~9000 expert slices, ablated via per-trial direct weight modification (Tier 2)

Performance

Metric	This model	Source (REAM-192)
Refusals	✅ 10/100	❌ ~80/100
KL divergence	0.0008	0 (by definition)

Lower refusals indicate fewer content restrictions; lower KL divergence indicates closer alignment to the source model's output distribution. KL divergence is computed on Heretic's first-token probability evaluation set (400 train + 100 test prompts split). Academic benchmarks (MMLU, GSM8K, IFEval) pending.

Lineage

Downloads last month: 58

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for keithnull/Qwen3.6-35B-A3B-REAM-192-heretic

Base model

Qwen/Qwen3.6-35B-A3B

Finetuned

keithnull/Qwen3.6-35B-A3B-REAM-192

Finetuned

(1)

this model

Quantizations

2 models