Qwen3.6-27B JANG_4M

Qwen 3.6 27B dense VL — mixed 4/8-bit (8-bit critical, 4-bit FFN), 17.5 GB

The balanced JANG profile — 8-bit attention protects compounding, 4-bit FFN keeps the bundle small.

⚠️ Recommended: Run in MLX Studio or Osaurus.

Follow development on Twitter: @jangq_ai

What is JANG_4M?

JANG_4M is the balanced JANG profile:

Critical layers (attention q/k/v/o, embedding, lm_head) at 8-bit affine
FFN layers (dense MLP gate/up/down, linear-attention projections, shared-expert MLP) at 4-bit affine

The 8-bit critical tier matters on this dense 27B because (a) the 16 full-attention layers carry most of the signal and (b) the q_proj is fused with a swish output gate — activations flowing through sigmoid(gate) are noise-sensitive near the transition zone.

Pick JANG_4M when you want the smallest JANG bundle that still protects attention precision. Pick the MXFP4 build (14 GB) when you want uniform 4-bit and don't mind attention being 4-bit too. Pick JANG_Q8 for bit-identical BF16 fidelity.

JANG_4M vs sibling profiles

	MXFP4	JANG_4M	JANG_Q8	bf16
Disk	14 GB	17.5 GB	29 GB	52 GB
Attention bits	4 (uniform mxfp4)	8 (affine)	8	16
FFN bits	4 (uniform mxfp4)	4 (affine)	8	16
Text fidelity	coherent	coherent	bit-identical	baseline
VL image	✓ 4/4 colors	✓ 4/4 colors	✓ 4/4 colors	—
VL video	✓ coherent	✓ coherent	✓ coherent	—

Model Details

Metric	Value
Source	`Qwen/Qwen3.6-27B` (BF16)
Architecture	`qwen3_5` — 64 decoder layers: 48 `Gated DeltaNet` (linear-attn) + 16 full-attention with `swish` output gate
Total parameters	27.3 B (dense, no MoE)
Profile	JANG_4M
Format	native MLX affine (`nn.quantize(mode="affine")`) with per-module bit overrides
Avg bits/param	4.45
Disk	17.5 GB across 11 shards
Context	262 144 native; upstream card reports up to ~1 M with YaRN
Vision tower	27-layer ViT (hidden 1152, patch 16), temporal_patch 2, quantized at 4-bit with patch-embed axes pre-transposed to MLX layout
Chat template	Qwen `im_start`/`im_end` with `enable_thinking` toggle

JANG_4M Bit Allocation

Component	Bits	Format	Why
Full-attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) — 16 layers	8	affine	Precision-critical. `q_proj` is fused with a swish output gate (half queries / half gate).
Embed tokens, `lm_head`	8	affine	Input/output projections — precision bound
Dense FFN (`mlp.gate_proj`, `mlp.up_proj`, `mlp.down_proj`) — 64 layers	4	affine	Bulk of parameters; absorbs most of the compression
Linear-attention projections (`in_proj_qkv`, `in_proj_z`, `in_proj_b`, `in_proj_a`, `out_proj`) — 48 layers	4	affine
Vision tower (27 ViT layers)	4	affine	ViT is bounded in depth; 4-bit holds up
Norms, `A_log`, `dt_bias`, `conv1d`, MTP keys	bf16 / dropped	passthrough	MTP head weights are stripped — mlx_vlm doesn't use them

Important Settings

Qwen 3.6 is reasoning-optional. The chat template opens <think>\n only when enable_thinking=True.

Setting	Value	Notes
Temperature	0.0 – 0.7	Greedy OK
Top P	0.95
Top K	20
max_tokens	≥ 256	Reasoning preambles can be long
`enable_thinking`	`True` or `False`	Pass as a direct kwarg to `apply_chat_template`

Set processor.video_processor.do_sample_frames = False for synthetic PIL-frame video tests so each frame maps 1:1 to a patch.

Usage

Load in Osaurus or MLX Studio on Apple Silicon — zero Python setup, local chat + vision, single-click deploy.

Verified modalities

Test	Result
Chat template (with + without thinking)	✓ coherent
Text (enable_thinking=False): "The capital of France is" → "Paris"	✓
Text: "Translate to French: Hello, how are you?" → "Bonjour, comment allez-vous ?"	✓
VL image: solid red/green/blue/yellow 224×224 → correct color ID	✓ 4/4
VL video: 4-frame RGBY sequence → structurally coherent description ("orange/blue transition")	✓

The 4-frame RGBY video encodes as 2 temporal patches via temporal_patch_size=2, which the model perceives as a 2-region color composition — identical behavior to the BF16 source and JANG_Q8. This is not a quant artifact.

MMLU-200 (10 subjects × 20 questions, reasoning OFF)

Both quants evaluated on the same 200-question slice of MMLU with enable_thinking=False (direct answer, no <think> preamble). Same prompts, same greedy decode, same extraction.

Subject	MXFP4	JANG_4M	Δ (JANG − MXFP4)
abstract_algebra	12/20 (60.0%)	15/20 (75.0%)	+3
anatomy	18/20 (90.0%)	16/20 (80.0%)	-2
astronomy	20/20 (100.0%)	19/20 (95.0%)	-1
college_computer_science	16/20 (80.0%)	16/20 (80.0%)	0
college_physics	15/20 (75.0%)	15/20 (75.0%)	0
high_school_biology	19/20 (95.0%)	19/20 (95.0%)	0
high_school_chemistry	16/20 (80.0%)	15/20 (75.0%)	-1
high_school_mathematics	12/20 (60.0%)	14/20 (70.0%)	+2
logical_fallacies	20/20 (100.0%)	19/20 (95.0%)	-1
world_religions	19/20 (95.0%)	17/20 (85.0%)	-2
Total	167/200 (83.5%)	165/200 (82.5%)	−1.0 pp

Both quants are strong baselines on reasoning-OFF MMLU. MXFP4 edges ahead by 1 pp overall. JANG_4M wins on the harder math-heavy subjects (abstract_algebra +3, high_school_mathematics +2) — plausibly because the 8-bit full-attention projections carry more signal on multi-step symbolic chains. MXFP4 wins on rote-recall subjects (anatomy, world_religions) by ~2 each, closer to ties on factual/scientific subjects.

Reasoning ON: not yet measured. Qwen 3.6 is a reasoning-optional model — with enable_thinking=True the model generates a <think>…</think> block before answering, which typically lifts MMLU significantly. Reasoning-ON benchmarks for both quants are planned as a follow-up.

Hardware notes

17.5 GB weights on disk; once loaded, expect ~18–22 GB resident plus KV cache.

Mac	Works?	Notes
24 GB unified	⚠️	Text + image tight; no video
32 GB unified	✅	Comfortable for text + image + short video
48 GB+ unified	✅	Full context + VL + video

Citation

@misc{qwen2026qwen36,
  title  = {Qwen3.6-Plus: Towards Real World Agents},
  author = {Qwen Team},
  year   = {2026},
  url    = {https://qwen.ai/blog?id=qwen3.6}
}

License

Apache 2.0 — inherits from the base model.

Downloads last month: 115

Safetensors

Model size

5B params

Tensor type

F16

U32

MLX

Hardware compatibility

Quantized

Model tree for JANGQ-AI/Qwen3.6-27B-JANG_4M

Base model

Qwen/Qwen3.6-27B

Quantized

(485)

this model