--- license: apache-2.0 base_model: - JetBrains/Mellum2-12B-A2.5B-Thinking language: - en tags: - chain-of-thought - gguf - llama.cpp - mellum - mellum2 - moe - quantized - reasoning pipeline_tag: text-generation --- # Mellum2-12B-A2.5B-Thinking - GGUF Quantized GGUF version of [JetBrains/Mellum2-12B-A2.5B-Thinking](https://huggingface.co/JetBrains/Mellum2-12B-A2.5B-Thinking). These were generated using the default settings with `llama-quantize` (b9482). ## Quantizations provided | File | Quantization | Size | |------|--------------|------| | `Mellum2-12B-A2.5B-Thinking-Q4_0.gguf` | Q4_01 | 6.91 GB | | `Mellum2-12B-A2.5B-Thinking-Q4_K_S.gguf` | Q4_K_S | 7.4 GB | | `Mellum2-12B-A2.5B-Thinking-Q4_K_M.gguf` | Q4_K_M | 8.07 GB | | `Mellum2-12B-A2.5B-Thinking-Q5_K_M.gguf` | Q5_K_M | 9.21 GB | | `Mellum2-12B-A2.5B-Thinking-Q6_K.gguf` | Q6_K | 10.9 GB | | `Mellum2-12B-A2.5B-Thinking-Q8_0.gguf` | Q8_0 | 12.9 GB | 1: Q4_0 is not recommended. Perplexity doubled which suggests degredated quality, and I encountered endlessly repeating tokens with my test prompt. ## Perplexity test I tested perplexity using `llama-perplexity` and Salesforce's [wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/tree/main/wikitext-2-raw-v1). | File | Quantization | Ctx | PPL | |------|--------------|-----|-----| | `Mellum2-12B-A2.5B-Thinking-Q4_0.gguf` | Q4_0 | 512 | 18.6689 +/- 0.17737 | | `Mellum2-12B-A2.5B-Thinking-Q4_K_S.gguf` | Q4_K_S | 512 | 9.8269 +/- 0.07248 | | `Mellum2-12B-A2.5B-Thinking-Q4_K_M.gguf` | Q4_K_M | 512 | 9.7410 +/- 0.07128 | | `Mellum2-12B-A2.5B-Thinking-Q5_K_M.gguf` | Q5_K_M | 512 | 9.4490 +/- 0.06807 | | `Mellum2-12B-A2.5B-Thinking-Q6_K.gguf` | Q6_K | 512 | 9.7329 +/- 0.07207 | | `Mellum2-12B-A2.5B-Thinking-Q8_0.gguf` | Q8_0 | 512 | 9.3657 +/- 0.06734 | | `Mellum2-12B-A2.5B-Thinking-BF16.gguf` | BF16 | 512 | 9.4037 +/- 0.06784 | ## Serving with llama.cpp llama.cpp added support for Mellum2 in release b9482. It has a max context size of 131,072. This can be served using: ```bash llama-server \ -hf CodeFault/Mellum2-12B-A2.5B-Thinking-GGUF:Q5_K_M \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 ```