noctrex commited on
Commit
c4e935d
·
verified ·
1 Parent(s): 7553ad7

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -34,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  LFM2-8B-A1B-MXFP4_MOE.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  LFM2-8B-A1B-MXFP4_MOE.gguf filter=lfs diff=lfs merge=lfs -text
37
+ LFM2-8B-A1B-MXFP4_MOE_F16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ LFM2-8B-A1B-MXFP4_MOE_BF16.gguf filter=lfs diff=lfs merge=lfs -text
39
+ LFM2-8B-A1B-Q8_XL_MOE.gguf filter=lfs diff=lfs merge=lfs -text
LFM2-8B-A1B-MXFP4_MOE.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6c433c7853f73bf948206a8902103e212044710c8a3c5199b11ddf022abffb12
3
- size 4750671040
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:976dba5cd53212f713a5147899814ef5ec14216b65d4cf6009973111559e0cc8
3
+ size 4876500032
LFM2-8B-A1B-MXFP4_MOE_BF16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e7906153f0dc2480ddbb0f3a76235713d748656148ba17f54208391459821d8e
3
+ size 5301173312
LFM2-8B-A1B-MXFP4_MOE_F16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3c73f2907d0471f91214771cc80e9c62abfaf53288416ed94785c62877dc9314
3
+ size 5301173312
LFM2-8B-A1B-Q8_XL_MOE.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf2f9ddad8112216d6dff9f806f0382acdc01c0f1521f4ed9b3c55ee398f9e3f
3
+ size 9418931264
README.md CHANGED
@@ -3,6 +3,28 @@ pipeline_tag: text-generation
3
  base_model:
4
  - LiquidAI/LFM2-8B-A1B
5
  ---
6
- This is a MXFP4_MOE quantization of the model LFM2-8B-A1B
7
 
8
- Original model: https://huggingface.co/unsloth/LFM2-8B-A1B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  base_model:
4
  - LiquidAI/LFM2-8B-A1B
5
  ---
6
+ These are **MXFP4** quantizations of the model [LiquidAI / LFM2-8B-A1B](https://huggingface.co/LiquidAI/LFM2-8B-A1B)
7
 
8
+ ## Quick Start
9
+ 1. Download the latest release of **llama.cpp**.
10
+ 2. Download your preferred model variant from below.
11
+
12
+ ## Which version should I choose?
13
+ All FP4 variants use **MXFP4** for the MoE (Mixture of Experts) weights to keep the model efficient.
14
+ I've included also a new type Q8_XL_MOE, that uses Q8 for MoE tensors and BF16 for everything else.
15
+ The difference lies in how the remaining tensors are handled:
16
+
17
+ | Variant | Quality | Performance | Size | Recommendation |
18
+ | :--- | :--- | :--- | ---: | :--- |
19
+ | **Q8_XL_MOE** | ⭐⭐⭐⭐⭐ | Variable* | 8.77GiB | Maximum quality, uses Q8 instead of FP4 for the MoE weights. |
20
+ | **BF16** | ⭐⭐⭐ | Variable* | 4.54GiB | Best for maximum accuracy; original unquantized weights. |
21
+ | **F16** | ⭐⭐ | Fast | 4.94GiB | Great alternative if BF16 is slow on your hardware. |
22
+ | **Q8** | ⭐ | Fastest | 4.94GiB | Balanced performance and memory usage. |
23
+
24
+ **Note:** On some older architectures, BF16 may be slower than F16.
25
+ Check that your GPU supports native BF16
26
+
27
+ Recommended parameters from LiquidAI:
28
+ - temperature 0.3
29
+ - min_p 0.15
30
+ - repetition_penalty 1.05