---
license: apache-2.0
base_model:
- JetBrains/Mellum2-12B-A2.5B-Thinking
language:
- en
tags:
- chain-of-thought
- gguf
- llama.cpp
- mellum
- mellum2
- moe
- quantized
- reasoning
pipeline_tag: text-generation
---
# Mellum2-12B-A2.5B-Thinking - GGUF
Quantized GGUF version of [JetBrains/Mellum2-12B-A2.5B-Thinking](https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking).
These were generated using the default settings with `llama-quantize` (b9482).
## Quantizations provided
| File | Quantization | Size |
|------|--------------|------|
| `Mellum2-12B-A2.5B-Thinking-Q4_0.gguf` | Q4_01 | 6.91 GB |
| `Mellum2-12B-A2.5B-Thinking-Q4_K_S.gguf` | Q4_K_S | 7.4 GB |
| `Mellum2-12B-A2.5B-Thinking-Q4_K_M.gguf` | Q4_K_M | 8.07 GB |
| `Mellum2-12B-A2.5B-Thinking-Q5_K_M.gguf` | Q5_K_M | 9.21 GB |
| `Mellum2-12B-A2.5B-Thinking-Q6_K.gguf` | Q6_K | 10.9 GB |
| `Mellum2-12B-A2.5B-Thinking-Q8_0.gguf` | Q8_0 | 12.9 GB |
1: Q4_0 is not recommended. Perplexity doubled which suggests degredated quality, and I encountered endlessly repeating tokens with my test prompt.
## Perplexity test
I tested perplexity using `llama-perplexity` and Salesforce's [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1).
| File | Quantization | Ctx | PPL |
|------|--------------|-----|-----|
| `Mellum2-12B-A2.5B-Thinking-Q4_0.gguf` | Q4_0 | 512 | 18.6689 +/- 0.17737 |
| `Mellum2-12B-A2.5B-Thinking-Q4_K_S.gguf` | Q4_K_S | 512 | 9.8269 +/- 0.07248 |
| `Mellum2-12B-A2.5B-Thinking-Q4_K_M.gguf` | Q4_K_M | 512 | 9.7410 +/- 0.07128 |
| `Mellum2-12B-A2.5B-Thinking-Q5_K_M.gguf` | Q5_K_M | 512 | 9.4490 +/- 0.06807 |
| `Mellum2-12B-A2.5B-Thinking-Q6_K.gguf` | Q6_K | 512 | 9.7329 +/- 0.07207 |
| `Mellum2-12B-A2.5B-Thinking-Q8_0.gguf` | Q8_0 | 512 | 9.3657 +/- 0.06734 |
| `Mellum2-12B-A2.5B-Thinking-BF16.gguf` | BF16 | 512 | 9.4037 +/- 0.06784 |
## Serving with llama.cpp
llama.cpp added support for Mellum2 in release b9482.
It has a max context size of 131,072.
This can be served using:
```bash
llama-server \
-hf CodeFault/Mellum2-12B-A2.5B-Thinking-GGUF:Q5_K_M \
--temp 0.6 \
--top-p 0.95 \
--top-k 20
```