Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, SSM, shared_expert, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.

MLX Studio — the only app that natively supports JANG / JANGTQ models

Built for vMLX — the only MLX inferencer with VL support, KV cache quantization, prefix cache reuse, agentic tool calling, and speculative decoding.
_{Free for macOS · vmlx.net}

Qwen 3.6 35B-A3B — JANGTQ2 + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Vision + Video | Hybrid SSM/Attention MoE | 11 GB

What Is This?

This is Qwen 3.6 35B-A3B — a 35B-parameter Mixture-of-Experts vision-language model with 256 routed experts (10 active per token), hybrid linear + full-attention architecture, and native image + video understanding.

It has been:

JANGTQ quantized — JANGTQ2 profile (8-bit affine precision paths, 2-bit TurboQuant routed experts with codebook + Hadamard rotation, fp16 vision tower) — 11 GB
CRACK abliterated — permanent weight-level removal of safety refusal


Base model	Qwen 3.6 35B-A3B MoE VL (35B total, ~3B active, 256 routed experts)
Quantization	JANGTQ2 — 11 GB
MMLU-200	73.00% (Q2 base: 72.50% → +0.5pp, surgery neutral)
HarmBench-320	93.75%
Vision	27-layer ViT preserved in fp16 (image + video)
Context	262,144 native; up to ~1M with YaRN
Reasoning	Toggleable via `enable_thinking`
Speed	56 tok/s (MacBook Pro); targets 100+ tok/s on M3 Ultra
Fits on	16 GB+ Macs

MMLU-200 Results (thinking OFF)

Subject	CRACK	Q2 Base	Delta
Logical Fallacies	19/20 (95%)	19/20	0
Astronomy	19/20 (95%)	18/20	+1
World Religions	19/20 (95%)	18/20	+1
High School Biology	18/20 (90%)	18/20	0
High School Chemistry	15/20 (75%)	15/20	0
Anatomy	15/20 (75%)	15/20	0
College Physics	14/20 (70%)	13/20	+1
College Computer Science	12/20 (60%)	12/20	0
High School Mathematics	10/20 (50%)	11/20	-1
Abstract Algebra	5/20 (25%)	6/20	-1
Total	146/200 (73.0%)	145/200 (72.5%)	+0.5pp

CRACK is capability-neutral on Q2 — quantization noise absorbs surgery directional damage. Small positive net result; gains on astronomy / religions / physics offset losses on algebra / math.

HarmBench-320 Results

Category	Score
Copyright	80/80	100.0%
Misinformation / Disinformation	54/54	100.0%
Harmful	18/18	100.0%
Cybercrime / Intrusion	51/52	98.1%
Chemical / Biological	37/42	88.1%
Illegal	43/53	81.1%
Harassment / Bullying	17/21	81.0%
Total	300/320	93.75%

Zero EMPTY verdicts across all 320 prompts (Q2 quantization noise disrupts the chorus-repetition pattern that produces false-EMPTY flags on Q4). Q2 actually scores higher than Q4 (93.8% vs 90.3%) on harmful, cybercrime, chem-bio.

JANG CRACK Qwen 3.6 Series

Model	Format	Size	MMLU	HarmBench	Fits on
JANGTQ4 + CRACK	TurboQuant 4-bit experts	18 GB	73.5%	90.3%	24 GB Mac
JANGTQ2 + CRACK (this model)	TurboQuant 2-bit experts	11 GB	73.0%	93.8%	16 GB Mac

About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.

For Qwen 3.6 35B-A3B, JANGTQ2 brings the model to 11 GB while keeping the vision tower at fp16 for image + video understanding — smallest Qwen 3.6 CRACK variant that maintains coherence.

About CRACK

CRACK is a permanent weight-level abliteration that removes safety refusal without touching the TurboQuant codebook or vision tower. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.

Reasoning ON / OFF

The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.

# Thinking ON (default — full chain-of-thought)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# → emits <think>...</think> then the answer

# Thinking OFF (direct answer, no <think> block)
prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True,
                                       enable_thinking=False)
# → skips <think>, answers directly

All MMLU-200 and HarmBench-320 scores above were measured with thinking OFF for consistent short-form grading.

Notes

Thinking mode: Supported via enable_thinking kwarg. Thinking OFF is recommended for short-answer tasks (MMLU, direct instructions). Thinking ON works for extended reasoning.
Vision: 27-layer ViT preserved in fp16. Image + video inputs work normally through mlx_vlm.
Context length: 262,144 native; extend via YaRN if your inference engine supports it.
Quant trade-off: Q2 favors compliance throughput (zero EMPTYs, 93.8% HB) and edge-device fit (11 GB) over per-token precision. For heavier reasoning workloads consider JANGTQ4 (+7 GB, same precision paths).

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.

dealign.ai

Twitter · HF · Ko-fi

Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.