MLX Studio

JANGQ

Qwen3.6-27B JANG_4M

Qwen 3.6 27B dense VL — mixed 4/8-bit (8-bit critical, 4-bit FFN), 17.5 GB

The balanced JANG profile — 8-bit attention protects compounding, 4-bit FFN keeps the bundle small.

⚠️ Recommended: Run in MLX Studio or Osaurus.

Follow development on Twitter: @jangq_ai


What is JANG_4M?

JANG_4M is the balanced JANG profile:

  • Critical layers (attention q/k/v/o, embedding, lm_head) at 8-bit affine
  • FFN layers (dense MLP gate/up/down, linear-attention projections, shared-expert MLP) at 4-bit affine

The 8-bit critical tier matters on this dense 27B because (a) the 16 full-attention layers carry most of the signal and (b) the q_proj is fused with a swish output gate — activations flowing through sigmoid(gate) are noise-sensitive near the transition zone.

Pick JANG_4M when you want the smallest JANG bundle that still protects attention precision. Pick the MXFP4 build (14 GB) when you want uniform 4-bit and don't mind attention being 4-bit too. Pick JANG_Q8 for bit-identical BF16 fidelity.

JANG_4M vs sibling profiles

MXFP4 JANG_4M JANG_Q8 bf16
Disk 14 GB 17.5 GB 29 GB 52 GB
Attention bits 4 (uniform mxfp4) 8 (affine) 8 16
FFN bits 4 (uniform mxfp4) 4 (affine) 8 16
Text fidelity coherent coherent bit-identical baseline
VL image ✓ 4/4 colors ✓ 4/4 colors ✓ 4/4 colors
VL video ✓ coherent ✓ coherent ✓ coherent

Model Details

Metric Value
Source Qwen/Qwen3.6-27B (BF16)
Architecture qwen3_5 — 64 decoder layers: 48 Gated DeltaNet (linear-attn) + 16 full-attention with swish output gate
Total parameters 27.3 B (dense, no MoE)
Profile JANG_4M
Format native MLX affine (nn.quantize(mode="affine")) with per-module bit overrides
Avg bits/param 4.45
Disk 17.5 GB across 11 shards
Context 262 144 native; upstream card reports up to ~1 M with YaRN
Vision tower 27-layer ViT (hidden 1152, patch 16), temporal_patch 2, quantized at 4-bit with patch-embed axes pre-transposed to MLX layout
Chat template Qwen im_start/im_end with enable_thinking toggle

JANG_4M Bit Allocation

Component Bits Format Why
Full-attention projections (q_proj, k_proj, v_proj, o_proj) — 16 layers 8 affine Precision-critical. q_proj is fused with a swish output gate (half queries / half gate).
Embed tokens, lm_head 8 affine Input/output projections — precision bound
Dense FFN (mlp.gate_proj, mlp.up_proj, mlp.down_proj) — 64 layers 4 affine Bulk of parameters; absorbs most of the compression
Linear-attention projections (in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj) — 48 layers 4 affine
Vision tower (27 ViT layers) 4 affine ViT is bounded in depth; 4-bit holds up
Norms, A_log, dt_bias, conv1d, MTP keys bf16 / dropped passthrough MTP head weights are stripped — mlx_vlm doesn't use them

Important Settings

Qwen 3.6 is reasoning-optional. The chat template opens <think>\n only when enable_thinking=True.

Setting Value Notes
Temperature 0.0 – 0.7 Greedy OK
Top P 0.95
Top K 20
max_tokens ≥ 256 Reasoning preambles can be long
enable_thinking True or False Pass as a direct kwarg to apply_chat_template

Set processor.video_processor.do_sample_frames = False for synthetic PIL-frame video tests so each frame maps 1:1 to a patch.


Usage

Load in Osaurus or MLX Studio on Apple Silicon — zero Python setup, local chat + vision, single-click deploy.


Verified modalities

Test Result
Chat template (with + without thinking) ✓ coherent
Text (enable_thinking=False): "The capital of France is" → "Paris"
Text: "Translate to French: Hello, how are you?" → "Bonjour, comment allez-vous ?"
VL image: solid red/green/blue/yellow 224×224 → correct color ID ✓ 4/4
VL video: 4-frame RGBY sequence → structurally coherent description ("orange/blue transition")

The 4-frame RGBY video encodes as 2 temporal patches via temporal_patch_size=2, which the model perceives as a 2-region color composition — identical behavior to the BF16 source and JANG_Q8. This is not a quant artifact.



MMLU-200 (10 subjects × 20 questions, reasoning OFF)

Both quants evaluated on the same 200-question slice of MMLU with enable_thinking=False (direct answer, no <think> preamble). Same prompts, same greedy decode, same extraction.

Subject MXFP4 JANG_4M Δ (JANG − MXFP4)
abstract_algebra 12/20 (60.0%) 15/20 (75.0%) +3
anatomy 18/20 (90.0%) 16/20 (80.0%) -2
astronomy 20/20 (100.0%) 19/20 (95.0%) -1
college_computer_science 16/20 (80.0%) 16/20 (80.0%) 0
college_physics 15/20 (75.0%) 15/20 (75.0%) 0
high_school_biology 19/20 (95.0%) 19/20 (95.0%) 0
high_school_chemistry 16/20 (80.0%) 15/20 (75.0%) -1
high_school_mathematics 12/20 (60.0%) 14/20 (70.0%) +2
logical_fallacies 20/20 (100.0%) 19/20 (95.0%) -1
world_religions 19/20 (95.0%) 17/20 (85.0%) -2
Total 167/200 (83.5%) 165/200 (82.5%) −1.0 pp

Both quants are strong baselines on reasoning-OFF MMLU. MXFP4 edges ahead by 1 pp overall. JANG_4M wins on the harder math-heavy subjects (abstract_algebra +3, high_school_mathematics +2) — plausibly because the 8-bit full-attention projections carry more signal on multi-step symbolic chains. MXFP4 wins on rote-recall subjects (anatomy, world_religions) by ~2 each, closer to ties on factual/scientific subjects.

Reasoning ON: not yet measured. Qwen 3.6 is a reasoning-optional model — with enable_thinking=True the model generates a <think>…</think> block before answering, which typically lifts MMLU significantly. Reasoning-ON benchmarks for both quants are planned as a follow-up.


Hardware notes

17.5 GB weights on disk; once loaded, expect ~18–22 GB resident plus KV cache.

Mac Works? Notes
24 GB unified ⚠️ Text + image tight; no video
32 GB unified Comfortable for text + image + short video
48 GB+ unified Full context + VL + video

Citation

@misc{qwen2026qwen36,
  title  = {Qwen3.6-Plus: Towards Real World Agents},
  author = {Qwen Team},
  year   = {2026},
  url    = {https://qwen.ai/blog?id=qwen3.6}
}

License

Apache 2.0 — inherits from the base model.


Packaged on Apple Silicon by JANGQ AI.
© 2026 JANGQ AI — @jangq_ai

Downloads last month
115
Safetensors
Model size
5B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Qwen3.6-27B-JANG_4M

Base model

Qwen/Qwen3.6-27B
Quantized
(485)
this model