--- license: other license_name: modified-mit tags: - moe - mixture-of-experts - jangtq - jangq-ai - reap - mlx - kimi - kimi-k2 - apple-silicon base_model: moonshotai/Kimi-K2.6 pipeline_tag: text-generation library_name: mlx --- # Kimi-K2.6-Med-JANGTQ **~720B-A32B MoE — 167 GB on disk** (down from Kimi K2.6's ~610 GB / ~1T base) — 35% routed-expert prune + 2-bit JANGTQ quantization, runnable on a 256 GB Apple Silicon Mac. - **Source**: [moonshotai/Kimi-K2.6](https://huggingface.co/moonshotai) (Moonshot AI's MLA / MoE flagship, INT4 pack-quantized base) - **Prune**: 35% of routed experts removed via REAP saliency ranking, computed over the v3 calibration corpus (24% code / 20% agentic / 20% general / 13% cyber / 8% science / 8% CN / 5% systems / 2% longctx) - **Quantization**: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated, Lloyd-Max optimized) on routed-expert weights + 8-bit affine on attention / dense MLP / shared experts / embed / lm_head + fp16 passthrough on norms, router gate, biases - **Bundle size**: **167 GB** across 178 shards - **Runs on**: Mac Studio M3 Ultra 256 GB (recommended) ## Variants The Kimi K2.6 × JANGTQ line, three variants: | Variant | Prune | Experts kept | Size | HF | |---------|-------|--------------|------|-----| | Kimi-K2.6-Small | 45% | 211 of 384 | 153 GB | `JANGQ-AI/Kimi-K2.6-Small-JANGTQ` | | **Kimi-K2.6-Med** (this card) | **35%** | 250 of 384 | 167 GB | `JANGQ-AI/Kimi-K2.6-Med-JANGTQ` | | Kimi-K2.6-Large | 25% | 288 of 384 | ~190 GB | *(building)* | More experts → less prune damage on long-tail / rare-domain prompts; the trade-off is RAM. Med is the sweet spot if you have 256 GB. ## What's in the bundle | Module | Bundle dtype | |---|---| | Routed experts (250 × 3 mats × 60 layers) | **2-bit MXTQ** + sidecar | | Shared expert | 8-bit affine g=64 | | Attention (q/k/v/o, MLA latents) | 8-bit affine g=64 | | Dense MLP (first `first_k_dense_replace=1` layer) | 8-bit affine g=64 | | embed_tokens / lm_head | 8-bit affine g=64 | | RMSNorms / router gate / e_score_correction_bias | fp16 / int64 passthrough | `jangtq_runtime.safetensors` sidecar (36 KB) for Swift runtime — covers `(in_features={2048,7168}, seed=42, bits=2)` codebook + sign-flip vectors. ## Loading ```bash pip install jang-tools mlx-lm ``` ```python from jang_tools import load_jangtq model, tokenizer = load_jangtq("JANGQ-AI/Kimi-K2.6-Med-JANGTQ") ``` vMLX (Swift) auto-loads from `config.json`'s `routed_expert_bits=2` + `mxtq_seed=42` + the sidecar. ## Credits JANGTQ codec, REAP calibration, JANGQ-AI bundle, conversion tooling: **Jinho Jang (eric@jangq.ai)**. Source model and its license belong to Moonshot AI (modified-MIT).