--- license: apache-2.0 language: - en - zh pipeline_tag: text-generation tags: - gguf - minicpm - minicpm5 - llama - text-generation - tool-calling - on-device - edge-ai - llama.cpp base_model: openbmb/MiniCPM5-1B --- # MiniCPM5-1B (GGUF Quantizations) This repository contains custom `GGUF` format quantizations of the [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) model. MiniCPM5-1B is a highly capable 1-billion parameter Transformer built for on-device, local deployment, and resource-constrained scenarios. It utilizes a standard `LlamaForCausalLM` architecture, features hybrid reasoning (built-in `` tokens), and supports a massive 131k context window. ## 📦 Available Files and Quantizations These models were quantized specifically for high-efficiency CPU/Edge inference using the `llama.cpp` framework. | Filename | Format | Size | Description | | :--- | :--- | :--- | :--- | | `minicpm5-1b-Q4_K_M.gguf` | Q4_K_M | 657 MB | Excellent balance of performance and size. **(Recommended for 4GB RAM/Mobile)** | | `minicpm5-1b-Q5_K_M.gguf` | Q5_K_M | 751 MB | Higher accuracy, slight increase in size. | | `minicpm5-1b-Q6_K.gguf` | Q6_K | 851 MB | Near-perfect fidelity to the base model. | | `minicpm5-1b-Q8_0.gguf` | Q8_0 | 1.1 GB | Maximum quantized quality; fast loading. | | `minicpm5-1b-f16.gguf` | F16 | 2.1 GB | Unquantized master weight container. | ## 🚀 Quick Start with llama.cpp Because MiniCPM5-1B uses standard Llama architecture, it is fully supported by `llama.cpp` out of the box. No custom forks or kernels are required. ### 1. Interactive CLI To run the model directly in your terminal using CPU threads: ```bash ./llama-cli -m minicpm5-1b-Q4_K_M.gguf -p "Artificial intelligence and local model deployment are transforming technology because" -n 256 -t 4