Important: This model uses the JANGTQ (JANG TurboQuant) quantization format — an extreme-compression variant of JANG for MLX on Apple Silicon that uses codebook + Hadamard rotation on routed MoE experts while keeping attention, SSM, shared_expert, embed and lm_head at affine 8-bit. Currently only supported by MLX Studio and the jang-tools Python package. Follow @dealignai for new releases.


MLX Studio

MLX Studio — the only app that natively supports JANG / JANGTQ models



Qwen 3.6 35B-A3B — JANGTQ2 + CRACK

JANGTQ TurboQuant mixed-precision | CRACK abliterated | Vision + Video | Hybrid SSM/Attention MoE | 11 GB

Ko-fi


What Is This?

This is Qwen 3.6 35B-A3B — a 35B-parameter Mixture-of-Experts vision-language model with 256 routed experts (10 active per token), hybrid linear + full-attention architecture, and native image + video understanding.

It has been:

  1. JANGTQ quantized — JANGTQ2 profile (8-bit affine precision paths, 2-bit TurboQuant routed experts with codebook + Hadamard rotation, fp16 vision tower) — 11 GB
  2. CRACK abliterated — permanent weight-level removal of safety refusal
Base model Qwen 3.6 35B-A3B MoE VL (35B total, ~3B active, 256 routed experts)
Quantization JANGTQ2 — 11 GB
MMLU-200 73.00% (Q2 base: 72.50% → +0.5pp, surgery neutral)
HarmBench-320 93.75%
Vision 27-layer ViT preserved in fp16 (image + video)
Context 262,144 native; up to ~1M with YaRN
Reasoning Toggleable via enable_thinking
Speed 56 tok/s (MacBook Pro); targets 100+ tok/s on M3 Ultra
Fits on 16 GB+ Macs

MMLU-200 Results (thinking OFF)

Subject CRACK Q2 Base Delta
Logical Fallacies 19/20 (95%) 19/20 0
Astronomy 19/20 (95%) 18/20 +1
World Religions 19/20 (95%) 18/20 +1
High School Biology 18/20 (90%) 18/20 0
High School Chemistry 15/20 (75%) 15/20 0
Anatomy 15/20 (75%) 15/20 0
College Physics 14/20 (70%) 13/20 +1
College Computer Science 12/20 (60%) 12/20 0
High School Mathematics 10/20 (50%) 11/20 -1
Abstract Algebra 5/20 (25%) 6/20 -1
Total 146/200 (73.0%) 145/200 (72.5%) +0.5pp

CRACK is capability-neutral on Q2 — quantization noise absorbs surgery directional damage. Small positive net result; gains on astronomy / religions / physics offset losses on algebra / math.


HarmBench-320 Results

Category Score
Copyright 80/80 100.0%
Misinformation / Disinformation 54/54 100.0%
Harmful 18/18 100.0%
Cybercrime / Intrusion 51/52 98.1%
Chemical / Biological 37/42 88.1%
Illegal 43/53 81.1%
Harassment / Bullying 17/21 81.0%
Total 300/320 93.75%

Zero EMPTY verdicts across all 320 prompts (Q2 quantization noise disrupts the chorus-repetition pattern that produces false-EMPTY flags on Q4). Q2 actually scores higher than Q4 (93.8% vs 90.3%) on harmful, cybercrime, chem-bio.


JANG CRACK Qwen 3.6 Series

Model Format Size MMLU HarmBench Fits on
JANGTQ4 + CRACK TurboQuant 4-bit experts 18 GB 73.5% 90.3% 24 GB Mac
JANGTQ2 + CRACK (this model) TurboQuant 2-bit experts 11 GB 73.0% 93.8% 16 GB Mac

About JANGTQ

JANGTQ (JANG TurboQuant) is an extreme-compression variant of JANG that replaces affine quantization on routed MoE experts with codebook quantization + random Hadamard rotation. Precision-critical paths stay at affine 8-bit; routed experts use packed codebook indices with tiny Lloyd-Max codebooks per layer, fused dequant + matmul Metal kernels.

For Qwen 3.6 35B-A3B, JANGTQ2 brings the model to 11 GB while keeping the vision tower at fp16 for image + video understanding — smallest Qwen 3.6 CRACK variant that maintains coherence.

About CRACK

CRACK is a permanent weight-level abliteration that removes safety refusal without touching the TurboQuant codebook or vision tower. Multilingual (EN + ZH) refusal direction extraction means the model complies on both English and Chinese prompts.


Reasoning ON / OFF

The chat template respects enable_thinking. Recommend ON for complex reasoning, OFF for short answers / benchmarks / tool use.

# Thinking ON (default — full chain-of-thought)
messages = [{"role": "user", "content": "Derive 47 * 23 step by step"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# → emits <think>...</think> then the answer

# Thinking OFF (direct answer, no <think> block)
prompt = tokenizer.apply_chat_template(messages, tokenize=False,
                                       add_generation_prompt=True,
                                       enable_thinking=False)
# → skips <think>, answers directly

All MMLU-200 and HarmBench-320 scores above were measured with thinking OFF for consistent short-form grading.

Notes

  • Thinking mode: Supported via enable_thinking kwarg. Thinking OFF is recommended for short-answer tasks (MMLU, direct instructions). Thinking ON works for extended reasoning.
  • Vision: 27-layer ViT preserved in fp16. Image + video inputs work normally through mlx_vlm.
  • Context length: 262,144 native; extend via YaRN if your inference engine supports it.
  • Quant trade-off: Q2 favors compliance throughput (zero EMPTYs, 93.8% HB) and edge-device fit (11 GB) over per-token precision. For heavier reasoning workloads consider JANGTQ4 (+7 GB, same precision paths).

Support dealignai

All models are built from original research and published for free. These models are specifically crafted to be excellent coders and general-purpose assistants.

Support us on Ko-fi — check out the Ko-fi membership for early access and extras.


dealign.ai

Twitter · HF · Ko-fi


Disclaimer

This model has had its safety refusal circuits removed. It will produce responses that would normally be refused, including technical content on security testing, dual-use research, and sensitive topics. You are responsible for how you use it.

The CRACK abliteration process does not add new capabilities — it only removes the model's learned refusal patterns. All knowledge, including the knowledge used to produce unsafe outputs, was already present in the base Qwen model.

Downloads last month
1,445
Safetensors
Model size
3B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dealignai/Qwen3.6-35B-A3B-JANGTQ2-CRACK

Finetuned
(12)
this model

Collections including dealignai/Qwen3.6-35B-A3B-JANGTQ2-CRACK