Instructions to use OOOrdis/HY-MT1.5-1.8B-oQ8-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OOOrdis/HY-MT1.5-1.8B-oQ8-fp16 with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir HY-MT1.5-1.8B-oQ8-fp16 OOOrdis/HY-MT1.5-1.8B-oQ8-fp16
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
HY-MT1.5-1.8B-oQ8-fp16
This model was quantized using oQ (oMLX v0.3.9.dev2) mixed-precision quantization.
Quantization details
- Model type: hunyuan_v1_dense
- Bits: 8
- Group size: 64
- Format: MLX safetensors
Tested on m1max(32c) 64G MacOS 26.5
Note: fp16 gives ~20% faster prefill on M1/M2 Apple Silicon (native fp16). bfloat16 is safer on M3/M4 and for numerical stability.
| Model | Context | PP (tok/s) | TG (tok/s) |
|---|---|---|---|
| HY-MT1.5-1.8B · 8bit | 1k | 1,096 | 116.0 |
| HY-MT1.5-1.8B · 8bit | 4k | 1,229 | 97.6 |
| HY-MT1.5-1.8B · 8bit | 8k | 1,074 | 80.3 |
| HY-MT1.5-1.8B · 8bit | 16k | 875.0 | 59.4 |
| HY-MT1.5-1.8B-oQ8-fp16 · 8bit | 1k | 1,614 | 121.0 |
| HY-MT1.5-1.8B-oQ8-fp16 · 8bit | 4k | 1,879 | 104.9 |
| HY-MT1.5-1.8B-oQ8-fp16 · 8bit | 8k | 1,501 | 91.1 |
| HY-MT1.5-1.8B-oQ8-fp16 · 8bit | 16k | 1,221 | 69.8 |
- Downloads last month
- 22
Model size
0.5B params
Tensor type
F16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support