VibeThinker-3B-GGUF / README.md
heath0xFF's picture
Upload README.md with huggingface_hub
9feb3d5 verified
|
Raw
History Blame Contribute Delete
1.36 kB
metadata
license: apache-2.0
base_model: WeiboAI/VibeThinker-3B
language:
  - en
  - zh
tags:
  - qwen
  - gguf
  - llama.cpp
  - thinking

VibeThinker-3B GGUF

GGUF quantizations of WeiboAI/VibeThinker-3B, a Qwen2-based 3B parameter thinking model with 131K context.

Converted with llama.cpp convert_hf_to_gguf.py.

Available Quantizations

File Size BPW Description
VibeThinker-3B-F16.gguf 5.8 GB 16.00 Full FP16 (reference)
VibeThinker-3B-Q8_0.gguf 3.1 GB 8.50 Near-lossless 8-bit
VibeThinker-3B-Q5_K_M.gguf 2.1 GB 5.75 High quality 5-bit
VibeThinker-3B-Q4_K_M.gguf 1.8 GB 4.99 Great size/quality tradeoff

Usage

llama.cpp

./llama-cli -m VibeThinker-3B-Q4_K_M.gguf -p "Hello!" -n 128

Chat Format

This model uses the Qwen2 chat format with thinking tags:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
<think>...reasoning...</think>
...response...
<|im_end|>

Model Details

  • Architecture: Qwen2ForCausalLM
  • Parameters: ~3B
  • Layers: 36
  • Hidden size: 2048
  • Heads: 16 (2 KV heads)
  • Context: 131,072 tokens
  • Vocab: 151,936