HY-MT1.5-1.8B-oQ8-fp16

This model was quantized using oQ (oMLX v0.3.9.dev2) mixed-precision quantization.

Quantization details

  • Model type: hunyuan_v1_dense
  • Bits: 8
  • Group size: 64
  • Format: MLX safetensors

Tested on m1max(32c) 64G MacOS 26.5

Note: fp16 gives ~20% faster prefill on M1/M2 Apple Silicon (native fp16). bfloat16 is safer on M3/M4 and for numerical stability.

Model Context PP (tok/s) TG (tok/s)
HY-MT1.5-1.8B · 8bit 1k 1,096 116.0
HY-MT1.5-1.8B · 8bit 4k 1,229 97.6
HY-MT1.5-1.8B · 8bit 8k 1,074 80.3
HY-MT1.5-1.8B · 8bit 16k 875.0 59.4
HY-MT1.5-1.8B-oQ8-fp16 · 8bit 1k 1,614 121.0
HY-MT1.5-1.8B-oQ8-fp16 · 8bit 4k 1,879 104.9
HY-MT1.5-1.8B-oQ8-fp16 · 8bit 8k 1,501 91.1
HY-MT1.5-1.8B-oQ8-fp16 · 8bit 16k 1,221 69.8
Downloads last month
22
Safetensors
Model size
0.5B params
Tensor type
F16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support