---
license: other
license_name: modified-mit
tags:
  - moe
  - mixture-of-experts
  - jangtq
  - jangq-ai
  - reap
  - mlx
  - kimi
  - kimi-k2
  - apple-silicon
base_model: moonshotai/Kimi-K2.6
pipeline_tag: text-generation
library_name: mlx
---

# Kimi-K2.6-Med-JANGTQ

**~720B-A32B MoE — 167 GB on disk** (down from Kimi K2.6's ~610 GB / ~1T
base) — 35% routed-expert prune + 2-bit JANGTQ quantization, runnable on
a 256 GB Apple Silicon Mac.

- **Source**: [moonshotai/Kimi-K2.6](https://huggingface.co/moonshotai)
  (Moonshot AI's MLA / MoE flagship, INT4 pack-quantized base)
- **Prune**: 35% of routed experts removed via REAP saliency ranking,
  computed over the v3 calibration corpus (24% code / 20% agentic /
  20% general / 13% cyber / 8% science / 8% CN / 5% systems / 2% longctx)
- **Quantization**: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated,
  Lloyd-Max optimized) on routed-expert weights + 8-bit affine on
  attention / dense MLP / shared experts / embed / lm_head + fp16
  passthrough on norms, router gate, biases
- **Bundle size**: **167 GB** across 178 shards
- **Runs on**: Mac Studio M3 Ultra 256 GB (recommended)

## Variants

The Kimi K2.6 × JANGTQ line, three variants:

| Variant | Prune | Experts kept | Size | HF |
|---------|-------|--------------|------|-----|
| Kimi-K2.6-Small | 45% | 211 of 384 | 153 GB | `JANGQ-AI/Kimi-K2.6-Small-JANGTQ` |
| **Kimi-K2.6-Med** (this card) | **35%** | 250 of 384 | 167 GB | `JANGQ-AI/Kimi-K2.6-Med-JANGTQ` |
| Kimi-K2.6-Large | 25% | 288 of 384 | ~190 GB | *(building)* |

More experts → less prune damage on long-tail / rare-domain prompts; the
trade-off is RAM. Med is the sweet spot if you have 256 GB.

## What's in the bundle

| Module | Bundle dtype |
|---|---|
| Routed experts (250 × 3 mats × 60 layers) | **2-bit MXTQ** + sidecar |
| Shared expert | 8-bit affine g=64 |
| Attention (q/k/v/o, MLA latents) | 8-bit affine g=64 |
| Dense MLP (first `first_k_dense_replace=1` layer) | 8-bit affine g=64 |
| embed_tokens / lm_head | 8-bit affine g=64 |
| RMSNorms / router gate / e_score_correction_bias | fp16 / int64 passthrough |

`jangtq_runtime.safetensors` sidecar (36 KB) for Swift runtime — covers
`(in_features={2048,7168}, seed=42, bits=2)` codebook + sign-flip vectors.

## Loading

```bash
pip install jang-tools mlx-lm
```

```python
from jang_tools import load_jangtq
model, tokenizer = load_jangtq("JANGQ-AI/Kimi-K2.6-Med-JANGTQ")
```

vMLX (Swift) auto-loads from `config.json`'s
`routed_expert_bits=2` + `mxtq_seed=42` + the sidecar.

## Credits

JANGTQ codec, REAP calibration, JANGQ-AI bundle, conversion tooling:
**Jinho Jang (eric@jangq.ai)**.

Source model and its license belong to Moonshot AI (modified-MIT).