---
language: en
license: apache-2.0
base_model: state-spaces/mamba2-1.3b
tags:
  - mamba2
  - ssm
  - state-space-model
  - compressed
  - hxq
  - helix-substrate
  - vector-quantization
  - helixcode
library_name: transformers
pipeline_tag: text-generation
model-index:
  - name: mamba2-1.3b-helix
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: WikiText-2
          type: wikitext
          config: wikitext-2-raw-v1
          split: test
        metrics:
          - type: perplexity
            value: 10.00
            name: Perplexity
            verified: true
---

# Mamba2-1.3B-HXQ

> **2.1x smaller. +8.0% PPL. Pure Mamba2 SSM scales with compression.**
>
> Mamba2-1.3B compressed from 2.9 GB to 1.4 GB with +8.0% perplexity. Down from +18.4% at 130M — SSM compression quality improves with model size. No calibration data. Just `pip install` and `from_pretrained()`.

## Install and Run

```bash
pip install "helix-substrate[hf]"
```

```python
import helix_substrate  # registers the HXQ quantizer with HuggingFace
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("EchoLabs33/mamba2-1.3b-helix")
tokenizer = AutoTokenizer.from_pretrained("EchoLabs33/mamba2-1.3b-helix")

inputs = tokenizer("The future of artificial intelligence", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

That's it. `import helix_substrate` registers the quantizer. `from_pretrained()` handles the rest automatically.

## Benchmark

| | Dense (BF16) | HXQ |
|---|---|---|
| **Size** | 2.9 GB | **1.4 GB** |
| **Perplexity** (WikiText-2) | 9.26 | 10.00 **(+8.0%)** |
| **Compression ratio** | — | **2.1x** |
| **Compressed modules** | — | 98 (in_proj + out_proj + embedding) |
| **Architecture** | Mamba2 (48 layers, pure SSM) | unchanged |

## Good to Know

- **+8.0% PPL delta** — higher than transformers at this scale, but down from +18.4% at 130M. SSM compression quality scales with model size.
- **GPU and CPU supported** — runs on any CUDA GPU or CPU via standard PyTorch. Fused kernels for additional speedup are in progress.
- **Fine-tunable via LoRA** — compressed weights remain frozen, but LoRA adapters attach to each `HelixLinear` layer via `HelixLinearSTE`. See `helix-substrate` for training infrastructure.
- **Requires `helix-substrate`** — the quantizer is not built into transformers. You need `pip install "helix-substrate[hf]"`.
- **`mamba-ssm` recommended** — without it, falls back to a slower sequential code path.
- **Requires `transformers >= 4.45`** — for Mamba2 architecture support.

## What is HelixCode?

HelixCode is a universal weight compression codec based on vector quantization:

- Each weight matrix is replaced by a **256-entry codebook** (float32) + **uint8 index matrix** + optional **sidecar corrections** for outlier values
- The compressed form *is* the executable — `HelixLinear` performs `codebook[indices] @ x` directly, no decompression step
- Works on any `nn.Linear` regardless of architecture (Transformer, Mamba, MLP, CNN)
- **No calibration data required** — unlike GPTQ/AWQ, codebooks are fit from the weights alone

## How It Works

1. `import helix_substrate` registers the `hxq` quantizer with HuggingFace
2. `from_pretrained()` reads `quantization_config.quant_method = "hxq"` from `config.json`
3. The quantizer replaces 98 modules with `HelixLinear` shells before weight loading
4. Safetensors populates the codebook, indices, and sidecar buffers directly
5. The model runs in compressed form — no decompression needed

## Architecture Details

Mamba2-1.3B is a pure state-space model with:
- **48 Mamba2 layers** (SSD blocks with in_proj + out_proj linear layers)
- **hidden_size=2048**, **64 heads**, **head_dim=64**
- **state_size=128**, **conv_kernel=4**

Only the in_proj, out_proj, and embedding layers are VQ-compressed. Mamba2-specific parameters (A_log, D, dt_bias, conv1d, norms) are stored at full precision.

## Compression Receipt

```
Compressed modules:  98  (in_proj + out_proj + embedding)
Exact tensors:       337 (A_log, D, dt_bias, conv1d, norms)
Total keys:          736
Output size:         1,388 MB
Weight ratio:        2.1x
PPL delta:           +8.0% (10.00 vs 9.26 dense)
Eval: WikiText-2 test, 2048 tokens, stride=512
```

## Companion Models

Same codec, same `pip install`, multiple architectures:

| Model | Architecture | Ratio | PPL Delta |
|-------|-------------|-------|-----------|
| [qwen2.5-14b-instruct-helix](https://huggingface.co/EchoLabs33/qwen2.5-14b-instruct-helix) | Transformer | 3.4x | pending |
| [qwen2.5-7b-instruct-helix](https://huggingface.co/EchoLabs33/qwen2.5-7b-instruct-helix) | Transformer | 2.2x | +6.34% |
| [qwen2.5-3b-instruct-helix](https://huggingface.co/EchoLabs33/qwen2.5-3b-instruct-helix) | Transformer | 1.6x | +0.69% |
| [qwen2.5-coder-3b-helix](https://huggingface.co/EchoLabs33/qwen2.5-coder-3b-helix) | Transformer (code) | 1.6x | +1.92% |
| [qwen2.5-coder-1.5b-instruct-helix](https://huggingface.co/EchoLabs33/qwen2.5-coder-1.5b-instruct-helix) | Transformer (code) | 2.4x | +1.63% |
| [tinyllama-1.1b-helix](https://huggingface.co/EchoLabs33/tinyllama-1.1b-helix) | Transformer | 4.0x | +0.78% |
| [zamba2-2.7b-instruct-helix](https://huggingface.co/EchoLabs33/zamba2-2.7b-instruct-helix) | Hybrid (Mamba2+Transformer) | 1.8x | +6.59% |
| [zamba2-1.2b-helix](https://huggingface.co/EchoLabs33/zamba2-1.2b-helix) | Hybrid (Mamba2+Transformer) | 1.7x | +2.90% |
| [mamba-130m-helix](https://huggingface.co/EchoLabs33/mamba-130m-helix) | Pure SSM | 3.8x | +18.4% |

## Citation

```bibtex
@software{helix_substrate_2026,
  title={Helix Substrate: Universal Weight Compression via HelixCode},
  author={EchoLabs},
  year={2026},
  url={https://github.com/echo313unfolding/helix-substrate}
}
```

## License

Apache 2.0 (inherited from [state-spaces/mamba2-1.3b](https://huggingface.co/state-spaces/mamba2-1.3b)).