Instructions to use JANGQ-AI/Qwen3.6-27B-JANG_4M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use JANGQ-AI/Qwen3.6-27B-JANG_4M with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("JANGQ-AI/Qwen3.6-27B-JANG_4M") config = load_config("JANGQ-AI/Qwen3.6-27B-JANG_4M") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use JANGQ-AI/Qwen3.6-27B-JANG_4M with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/Qwen3.6-27B-JANG_4M"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "JANGQ-AI/Qwen3.6-27B-JANG_4M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use JANGQ-AI/Qwen3.6-27B-JANG_4M with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "JANGQ-AI/Qwen3.6-27B-JANG_4M"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default JANGQ-AI/Qwen3.6-27B-JANG_4M
Run Hermes
hermes
Qwen3.6-27B JANG_4M
Qwen 3.6 27B dense VL — mixed 4/8-bit (8-bit critical, 4-bit FFN), 17.5 GB
The balanced JANG profile — 8-bit attention protects compounding, 4-bit FFN keeps the bundle small.
⚠️ Recommended: Run in MLX Studio or Osaurus.
Follow development on Twitter: @jangq_ai
What is JANG_4M?
JANG_4M is the balanced JANG profile:
- Critical layers (attention
q/k/v/o, embedding,lm_head) at 8-bit affine - FFN layers (dense MLP
gate/up/down, linear-attention projections, shared-expert MLP) at 4-bit affine
The 8-bit critical tier matters on this dense 27B because (a) the 16 full-attention layers carry most of the signal and (b) the q_proj is fused with a swish output gate — activations flowing through sigmoid(gate) are noise-sensitive near the transition zone.
Pick JANG_4M when you want the smallest JANG bundle that still protects attention precision. Pick the MXFP4 build (14 GB) when you want uniform 4-bit and don't mind attention being 4-bit too. Pick JANG_Q8 for bit-identical BF16 fidelity.
JANG_4M vs sibling profiles
| MXFP4 | JANG_4M | JANG_Q8 | bf16 | |
|---|---|---|---|---|
| Disk | 14 GB | 17.5 GB | 29 GB | 52 GB |
| Attention bits | 4 (uniform mxfp4) | 8 (affine) | 8 | 16 |
| FFN bits | 4 (uniform mxfp4) | 4 (affine) | 8 | 16 |
| Text fidelity | coherent | coherent | bit-identical | baseline |
| VL image | ✓ 4/4 colors | ✓ 4/4 colors | ✓ 4/4 colors | — |
| VL video | ✓ coherent | ✓ coherent | ✓ coherent | — |
Model Details
| Metric | Value |
|---|---|
| Source | Qwen/Qwen3.6-27B (BF16) |
| Architecture | qwen3_5 — 64 decoder layers: 48 Gated DeltaNet (linear-attn) + 16 full-attention with swish output gate |
| Total parameters | 27.3 B (dense, no MoE) |
| Profile | JANG_4M |
| Format | native MLX affine (nn.quantize(mode="affine")) with per-module bit overrides |
| Avg bits/param | 4.45 |
| Disk | 17.5 GB across 11 shards |
| Context | 262 144 native; upstream card reports up to ~1 M with YaRN |
| Vision tower | 27-layer ViT (hidden 1152, patch 16), temporal_patch 2, quantized at 4-bit with patch-embed axes pre-transposed to MLX layout |
| Chat template | Qwen im_start/im_end with enable_thinking toggle |
JANG_4M Bit Allocation
| Component | Bits | Format | Why |
|---|---|---|---|
Full-attention projections (q_proj, k_proj, v_proj, o_proj) — 16 layers |
8 | affine | Precision-critical. q_proj is fused with a swish output gate (half queries / half gate). |
Embed tokens, lm_head |
8 | affine | Input/output projections — precision bound |
Dense FFN (mlp.gate_proj, mlp.up_proj, mlp.down_proj) — 64 layers |
4 | affine | Bulk of parameters; absorbs most of the compression |
Linear-attention projections (in_proj_qkv, in_proj_z, in_proj_b, in_proj_a, out_proj) — 48 layers |
4 | affine | |
| Vision tower (27 ViT layers) | 4 | affine | ViT is bounded in depth; 4-bit holds up |
Norms, A_log, dt_bias, conv1d, MTP keys |
bf16 / dropped | passthrough | MTP head weights are stripped — mlx_vlm doesn't use them |
Important Settings
Qwen 3.6 is reasoning-optional. The chat template opens <think>\n only when enable_thinking=True.
| Setting | Value | Notes |
|---|---|---|
| Temperature | 0.0 – 0.7 | Greedy OK |
| Top P | 0.95 | |
| Top K | 20 | |
| max_tokens | ≥ 256 | Reasoning preambles can be long |
enable_thinking |
True or False |
Pass as a direct kwarg to apply_chat_template |
Set processor.video_processor.do_sample_frames = False for synthetic PIL-frame video tests so each frame maps 1:1 to a patch.
Usage
Load in Osaurus or MLX Studio on Apple Silicon — zero Python setup, local chat + vision, single-click deploy.
Verified modalities
| Test | Result |
|---|---|
| Chat template (with + without thinking) | ✓ coherent |
| Text (enable_thinking=False): "The capital of France is" → "Paris" | ✓ |
| Text: "Translate to French: Hello, how are you?" → "Bonjour, comment allez-vous ?" | ✓ |
| VL image: solid red/green/blue/yellow 224×224 → correct color ID | ✓ 4/4 |
| VL video: 4-frame RGBY sequence → structurally coherent description ("orange/blue transition") | ✓ |
The 4-frame RGBY video encodes as 2 temporal patches via temporal_patch_size=2, which the model perceives as a 2-region color composition — identical behavior to the BF16 source and JANG_Q8. This is not a quant artifact.
MMLU-200 (10 subjects × 20 questions, reasoning OFF)
Both quants evaluated on the same 200-question slice of MMLU with enable_thinking=False (direct answer, no <think> preamble). Same prompts, same greedy decode, same extraction.
| Subject | MXFP4 | JANG_4M | Δ (JANG − MXFP4) |
|---|---|---|---|
| abstract_algebra | 12/20 (60.0%) | 15/20 (75.0%) | +3 |
| anatomy | 18/20 (90.0%) | 16/20 (80.0%) | -2 |
| astronomy | 20/20 (100.0%) | 19/20 (95.0%) | -1 |
| college_computer_science | 16/20 (80.0%) | 16/20 (80.0%) | 0 |
| college_physics | 15/20 (75.0%) | 15/20 (75.0%) | 0 |
| high_school_biology | 19/20 (95.0%) | 19/20 (95.0%) | 0 |
| high_school_chemistry | 16/20 (80.0%) | 15/20 (75.0%) | -1 |
| high_school_mathematics | 12/20 (60.0%) | 14/20 (70.0%) | +2 |
| logical_fallacies | 20/20 (100.0%) | 19/20 (95.0%) | -1 |
| world_religions | 19/20 (95.0%) | 17/20 (85.0%) | -2 |
| Total | 167/200 (83.5%) | 165/200 (82.5%) | −1.0 pp |
Both quants are strong baselines on reasoning-OFF MMLU. MXFP4 edges ahead by 1 pp overall. JANG_4M wins on the harder math-heavy subjects (abstract_algebra +3, high_school_mathematics +2) — plausibly because the 8-bit full-attention projections carry more signal on multi-step symbolic chains. MXFP4 wins on rote-recall subjects (anatomy, world_religions) by ~2 each, closer to ties on factual/scientific subjects.
Reasoning ON: not yet measured. Qwen 3.6 is a reasoning-optional model — with
enable_thinking=Truethe model generates a<think>…</think>block before answering, which typically lifts MMLU significantly. Reasoning-ON benchmarks for both quants are planned as a follow-up.
Hardware notes
17.5 GB weights on disk; once loaded, expect ~18–22 GB resident plus KV cache.
| Mac | Works? | Notes |
|---|---|---|
| 24 GB unified | ⚠️ | Text + image tight; no video |
| 32 GB unified | ✅ | Comfortable for text + image + short video |
| 48 GB+ unified | ✅ | Full context + VL + video |
Citation
@misc{qwen2026qwen36,
title = {Qwen3.6-Plus: Towards Real World Agents},
author = {Qwen Team},
year = {2026},
url = {https://qwen.ai/blog?id=qwen3.6}
}
License
Apache 2.0 — inherits from the base model.
Packaged on Apple Silicon by JANGQ AI.
© 2026 JANGQ AI — @jangq_ai
- Downloads last month
- 115
Quantized
Model tree for JANGQ-AI/Qwen3.6-27B-JANG_4M
Base model
Qwen/Qwen3.6-27B