supra-1.5-50m-instruct-exp-mxfp8-mlx

MLX quantization of SupraLabs/Supra-1.5-50M-Instruct-exp for Apple Silicon.

Variant: Block float MX FP8
Disk size: 53 MB
Quantized by: sahilchachra

Benchmark results

Evaluated on Apple M4 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Decode tok/s (avg, long traces)	670.29	1025.59
Peak memory (GB)	0.152	0.223
Disk size (MB)	53	101

Quality

Benchmark	This model	FP16 baseline	n
IFEval (instruction following)	22.7%	15.9%	44
Alpaca-cleaned (instruct F1 vs reference)	41.0	40.9	50

Context scaling (decode tok/s)

Context length	Decode tok/s
~128 tokens	659.6
~256 tokens	644.8
~512 tokens	683.7
~1024 tokens	693.0

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx")
response = generate(model, tokenizer, prompt="Your prompt here", max_tokens=256, verbose=True)

All variants in this collection

Model	Variant
sahilchachra/supra-1.5-50m-instruct-exp-mxfp4-mlx	Block float MX FP4
sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx	Block float MX FP8 ← this model

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M4 Pro, 24 GB unified memory
License: see SupraLabs/Supra-1.5-50M-Instruct-exp for the original model's license

Original model

See SupraLabs/Supra-1.5-50M-Instruct-exp for full model details and intended use.

Downloads last month: 24

Safetensors

Model size

14.6M params

Tensor type

U32

BF16

MLX

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx

Base model

SupraLabs/Supra-1.5-50M-Base-exp

Finetuned

SupraLabs/Supra-1.5-50M-Instruct-exp

Quantized

(10)

this model

Collection including sahilchachra/supra-1.5-50m-instruct-exp-mxfp8-mlx

Supra 1.5 50M Instruct

Collection

3 items • Updated 11 days ago