---
license: apache-2.0
base_model: WeiboAI/VibeThinker-3B
language:
  - en
  - zh
tags:
  - qwen
  - gguf
  - llama.cpp
  - thinking
---

# VibeThinker-3B GGUF

GGUF quantizations of [WeiboAI/VibeThinker-3B](https://huggingface.co/WeiboAI/VibeThinker-3B), a Qwen2-based 3B parameter thinking model with 131K context.

Converted with [llama.cpp](https://github.com/ggerganov/llama.cpp) `convert_hf_to_gguf.py`.

## Available Quantizations

| File | Size | BPW | Description |
|------|------|-----|-------------|
| `VibeThinker-3B-F16.gguf` | 5.8 GB | 16.00 | Full FP16 (reference) |
| `VibeThinker-3B-Q8_0.gguf` | 3.1 GB | 8.50 | Near-lossless 8-bit |
| `VibeThinker-3B-Q5_K_M.gguf` | 2.1 GB | 5.75 | High quality 5-bit |
| `VibeThinker-3B-Q4_K_M.gguf` | 1.8 GB | 4.99 | Great size/quality tradeoff |

## Usage

### llama.cpp

```bash
./llama-cli -m VibeThinker-3B-Q4_K_M.gguf -p "Hello!" -n 128
```

### Chat Format

This model uses the Qwen2 chat format with thinking tags:

```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
<think>...reasoning...</think>
...response...
<|im_end|>
```

## Model Details

- **Architecture:** Qwen2ForCausalLM
- **Parameters:** ~3B
- **Layers:** 36
- **Hidden size:** 2048
- **Heads:** 16 (2 KV heads)
- **Context:** 131,072 tokens
- **Vocab:** 151,936