--- license: apache-2.0 base_model: WeiboAI/VibeThinker-3B language: - en - zh tags: - qwen - gguf - llama.cpp - thinking --- # VibeThinker-3B GGUF GGUF quantizations of [WeiboAI/VibeThinker-3B](https://huggingface.co/WeiboAI/VibeThinker-3B), a Qwen2-based 3B parameter thinking model with 131K context. Converted with [llama.cpp](https://github.com/ggerganov/llama.cpp) `convert_hf_to_gguf.py`. ## Available Quantizations | File | Size | BPW | Description | |------|------|-----|-------------| | `VibeThinker-3B-F16.gguf` | 5.8 GB | 16.00 | Full FP16 (reference) | | `VibeThinker-3B-Q8_0.gguf` | 3.1 GB | 8.50 | Near-lossless 8-bit | | `VibeThinker-3B-Q5_K_M.gguf` | 2.1 GB | 5.75 | High quality 5-bit | | `VibeThinker-3B-Q4_K_M.gguf` | 1.8 GB | 4.99 | Great size/quality tradeoff | ## Usage ### llama.cpp ```bash ./llama-cli -m VibeThinker-3B-Q4_K_M.gguf -p "Hello!" -n 128 ``` ### Chat Format This model uses the Qwen2 chat format with thinking tags: ``` <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user Hello!<|im_end|> <|im_start|>assistant ...reasoning... ...response... <|im_end|> ``` ## Model Details - **Architecture:** Qwen2ForCausalLM - **Parameters:** ~3B - **Layers:** 36 - **Hidden size:** 2048 - **Heads:** 16 (2 KV heads) - **Context:** 131,072 tokens - **Vocab:** 151,936