---
license: apache-2.0
language:
  - en
  - zh
pipeline_tag: text-generation
tags:
  - gguf
  - minicpm
  - minicpm5
  - llama
  - text-generation
  - tool-calling
  - on-device
  - edge-ai
  - llama.cpp
base_model: openbmb/MiniCPM5-1B
---

# MiniCPM5-1B (GGUF Quantizations)

This repository contains custom `GGUF` format quantizations of the [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) model. 

MiniCPM5-1B is a highly capable 1-billion parameter Transformer built for on-device, local deployment, and resource-constrained scenarios. It utilizes a standard `LlamaForCausalLM` architecture, features hybrid reasoning (built-in `<think>` tokens), and supports a massive 131k context window.

## 📦 Available Files and Quantizations

These models were quantized specifically for high-efficiency CPU/Edge inference using the `llama.cpp` framework.

| Filename | Format | Size | Description |
| :--- | :--- | :--- | :--- |
| `minicpm5-1b-Q4_K_M.gguf` | Q4_K_M | 657 MB | Excellent balance of performance and size. **(Recommended for 4GB RAM/Mobile)** |
| `minicpm5-1b-Q5_K_M.gguf` | Q5_K_M | 751 MB | Higher accuracy, slight increase in size. |
| `minicpm5-1b-Q6_K.gguf` | Q6_K | 851 MB | Near-perfect fidelity to the base model. |
| `minicpm5-1b-Q8_0.gguf` | Q8_0 | 1.1 GB | Maximum quantized quality; fast loading. |
| `minicpm5-1b-f16.gguf` | F16 | 2.1 GB | Unquantized master weight container. |

## 🚀 Quick Start with llama.cpp

Because MiniCPM5-1B uses standard Llama architecture, it is fully supported by `llama.cpp` out of the box. No custom forks or kernels are required.

### 1. Interactive CLI
To run the model directly in your terminal using CPU threads:

```bash
./llama-cli -m minicpm5-1b-Q4_K_M.gguf -p "Artificial intelligence and local model deployment are transforming technology because" -n 256 -t 4