---
pipeline_tag: text-generation
base_model:
- LiquidAI/LFM2-8B-A1B
---
These are **MXFP4** quantizations of the model [LiquidAI / LFM2-8B-A1B](https://huggingface.co/LiquidAI/LFM2-8B-A1B)

## Quick Start
1. Download the latest release of **llama.cpp**.
2. Download your preferred model variant from below.

## Which version should I choose?
All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient.  
I've included also a new type Q8_XL_MOE, that uses Q8 for MoE tensors and BF16 for everything else.
The difference lies in how the remaining tensors are handled:

| Variant | Quality | Performance | Size | Recommendation |
| :--- | :--- | :--- | ---: | :--- |
| **Q8_XL_MOE** | ⭐⭐⭐⭐⭐ | Variable* | 8.77GiB | Maximum quality, uses Q8 instead of FP4 for the MoE weights. |
| **BF16** | ⭐⭐⭐ | Variable* | 4.54GiB | Best for maximum accuracy; original unquantized weights. |
| **F16** | ⭐⭐ | Fast | 4.94GiB | Great alternative if BF16 is slow on your hardware. |
| **Q8** | ⭐ | Fastest | 4.94GiB | Balanced performance and memory usage. |

**Note:** On some older architectures, BF16 may be slower than F16.  
Check that your GPU supports native BF16

Recommended parameters from LiquidAI:
- temperature 0.3
- min_p 0.15
- repetition_penalty 1.05