Supra 1.5 50M Instruct
Collection
3 items • Updated
How to use sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir supra-1.5-50m-instruct-exp-mxfp8-mlx sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx
MLX quantization of SupraLabs/Supra-1.5-50M-Instruct-exp for Apple Silicon.
Variant: Block float MX FP8
Disk size: 53 MB
Quantized by: sahilchachra
Evaluated on Apple M4 Pro with MLX. Model loaded once; performance and quality measured in a single pass.
| This model | FP16 baseline | |
|---|---|---|
| Decode tok/s (avg, long traces) | 670.29 | 1025.59 |
| Peak memory (GB) | 0.152 | 0.223 |
| Disk size (MB) | 53 | 101 |
| Benchmark | This model | FP16 baseline | n |
|---|---|---|---|
| IFEval (instruction following) | 22.7% | 15.9% | 44 |
| Alpaca-cleaned (instruct F1 vs reference) | 41.0 | 40.9 | 50 |
| Context length | Decode tok/s |
|---|---|
| ~128 tokens | 659.6 |
| ~256 tokens | 644.8 |
| ~512 tokens | 683.7 |
| ~1024 tokens | 693.0 |
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)
| Model | Variant |
|---|---|
| sahilchachra/supra-1.5-50m-instruct-exp-mxfp4-mlx | Block float MX FP4 |
| sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx | Block float MX FP8 ← this model |
See SupraLabs/Supra-1.5-50M-Instruct-exp for full model details and intended use.
8-bit
Base model
SupraLabs/Supra-1.5-50M-Base-exp
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir supra-1.5-50m-instruct-exp-mxfp8-mlx sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx