These are MXFP4 quantizations of the model LiquidAI / LFM2-8B-A1B

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.

Which version should I choose?

All FP4 variants use MXFP4 for the MoE (Mixture of Experts) weights to keep the model efficient.
I've included also a new type Q8_XL_MOE, that uses Q8 for MoE tensors and BF16 for everything else. The difference lies in how the remaining tensors are handled:

Variant Quality Performance Size Recommendation
Q8_XL_MOE ⭐⭐⭐⭐⭐ Variable* 8.77GiB Maximum quality, uses Q8 instead of FP4 for the MoE weights.
BF16 ⭐⭐⭐ Variable* 4.54GiB Best for maximum accuracy; original unquantized weights.
F16 ⭐⭐ Fast 4.94GiB Great alternative if BF16 is slow on your hardware.
Q8 Fastest 4.94GiB Balanced performance and memory usage.

Note: On some older architectures, BF16 may be slower than F16.
Check that your GPU supports native BF16

Recommended parameters from LiquidAI:

  • temperature 0.3
  • min_p 0.15
  • repetition_penalty 1.05
Downloads last month
269
GGUF
Model size
8B params
Architecture
lfm2moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for noctrex/LFM2-8B-A1B-MXFP4_MOE-GGUF

Quantized
(28)
this model