--- pipeline_tag: text-generation base_model: - LiquidAI/LFM2-8B-A1B --- These are **MXFP4** quantizations of the model [LiquidAI / LFM2-8B-A1B](https://huggingface.co/LiquidAI/LFM2-8B-A1B) ## Quick Start 1. Download the latest release of **llama.cpp**. 2. Download your preferred model variant from below. ## Which version should I choose? All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient. I've included also a new type Q8_XL_MOE, that uses Q8 for MoE tensors and BF16 for everything else. The difference lies in how the remaining tensors are handled: | Variant | Quality | Performance | Size | Recommendation | | :--- | :--- | :--- | ---: | :--- | | **Q8_XL_MOE** | ⭐⭐⭐⭐⭐ | Variable* | 8.77GiB | Maximum quality, uses Q8 instead of FP4 for the MoE weights. | | **BF16** | ⭐⭐⭐ | Variable* | 4.54GiB | Best for maximum accuracy; original unquantized weights. | | **F16** | ⭐⭐ | Fast | 4.94GiB | Great alternative if BF16 is slow on your hardware. | | **Q8** | ⭐ | Fastest | 4.94GiB | Balanced performance and memory usage. | **Note:** On some older architectures, BF16 may be slower than F16. Check that your GPU supports native BF16 Recommended parameters from LiquidAI: - temperature 0.3 - min_p 0.15 - repetition_penalty 1.05