Instructions to use EchoLabs33/mamba2-1.3b-hxq with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EchoLabs33/mamba2-1.3b-hxq with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EchoLabs33/mamba2-1.3b-hxq")

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("EchoLabs33/mamba2-1.3b-hxq")
model = AutoModelForMultimodalLM.from_pretrained("EchoLabs33/mamba2-1.3b-hxq")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use EchoLabs33/mamba2-1.3b-hxq with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EchoLabs33/mamba2-1.3b-hxq"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EchoLabs33/mamba2-1.3b-hxq",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/EchoLabs33/mamba2-1.3b-hxq

SGLang

How to use EchoLabs33/mamba2-1.3b-hxq with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EchoLabs33/mamba2-1.3b-hxq" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EchoLabs33/mamba2-1.3b-hxq",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EchoLabs33/mamba2-1.3b-hxq" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EchoLabs33/mamba2-1.3b-hxq",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use EchoLabs33/mamba2-1.3b-hxq with Docker Model Runner:
```
docker model run hf.co/EchoLabs33/mamba2-1.3b-hxq
```

voidstream commited on Apr 18

Commit

6d43155

verified ·

1 Parent(s): d582e02

Update model card: LoRA fine-tuning now supported via HelixLinearSTE

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -71,7 +71,7 @@ That's it. `import helix_substrate` registers the quantizer. `from_pretrained()`
 - **+8.0% PPL delta** — higher than transformers at this scale, but down from +18.4% at 130M. SSM compression quality scales with model size.
 - **GPU and CPU supported** — runs on any CUDA GPU or CPU via standard PyTorch. Fused kernels for additional speedup are in progress.
-- **Not fine-tunable** — compressed weights are read-only (`is_trainable = False`).
 - **Requires `helix-substrate`** — the quantizer is not built into transformers. You need `pip install "helix-substrate[hf]"`.
 - **`mamba-ssm` recommended** — without it, falls back to a slower sequential code path.
 - **Requires `transformers >= 4.45`** — for Mamba2 architecture support.

 - **+8.0% PPL delta** — higher than transformers at this scale, but down from +18.4% at 130M. SSM compression quality scales with model size.
 - **GPU and CPU supported** — runs on any CUDA GPU or CPU via standard PyTorch. Fused kernels for additional speedup are in progress.
+- **Fine-tunable via LoRA** — compressed weights remain frozen, but LoRA adapters attach to each `HelixLinear` layer via `HelixLinearSTE`. See `helix-substrate` for training infrastructure.
 - **Requires `helix-substrate`** — the quantizer is not built into transformers. You need `pip install "helix-substrate[hf]"`.
 - **`mamba-ssm` recommended** — without it, falls back to a slower sequential code path.
 - **Requires `transformers >= 4.45`** — for Mamba2 architecture support.