VibeThinker-3B-GGUF / README.md
heath0xFF's picture
Upload README.md with huggingface_hub
9feb3d5 verified
|
Raw
History Blame Contribute Delete
1.36 kB
---
license: apache-2.0
base_model: WeiboAI/VibeThinker-3B
language:
- en
- zh
tags:
- qwen
- gguf
- llama.cpp
- thinking
---
# VibeThinker-3B GGUF
GGUF quantizations of [WeiboAI/VibeThinker-3B](https://huggingface.co/WeiboAI/VibeThinker-3B), a Qwen2-based 3B parameter thinking model with 131K context.
Converted with [llama.cpp](https://github.com/ggerganov/llama.cpp) `convert_hf_to_gguf.py`.
## Available Quantizations
| File | Size | BPW | Description |
|------|------|-----|-------------|
| `VibeThinker-3B-F16.gguf` | 5.8 GB | 16.00 | Full FP16 (reference) |
| `VibeThinker-3B-Q8_0.gguf` | 3.1 GB | 8.50 | Near-lossless 8-bit |
| `VibeThinker-3B-Q5_K_M.gguf` | 2.1 GB | 5.75 | High quality 5-bit |
| `VibeThinker-3B-Q4_K_M.gguf` | 1.8 GB | 4.99 | Great size/quality tradeoff |
## Usage
### llama.cpp
```bash
./llama-cli -m VibeThinker-3B-Q4_K_M.gguf -p "Hello!" -n 128
```
### Chat Format
This model uses the Qwen2 chat format with thinking tags:
```
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
<think>...reasoning...</think>
...response...
<|im_end|>
```
## Model Details
- **Architecture:** Qwen2ForCausalLM
- **Parameters:** ~3B
- **Layers:** 36
- **Hidden size:** 2048
- **Heads:** 16 (2 KV heads)
- **Context:** 131,072 tokens
- **Vocab:** 151,936