VibeThinker-3B — hipfire quantized (MQ4 + MQ6)

FWHT-rotated quantizations of WeiboAI/VibeThinker-3B for hipfire, a Rust-native inference engine for AMD RDNA GPUs.

Files

File	Format	Size	Quality
`vibethinker-3b.mq4.hfq`	MQ4G256 (4-bit FWHT)	1.8 GB	good
`vibethinker-3b.mq6.hfq`	MQ6G256 (6-bit FWHT)	2.4 GB	better

MQ4 is the default pick — fastest decode, smallest size.
MQ6 is worth it if you have extra VRAM budget and want tighter quality.

Usage

# pull and run (hipfire auto-selects the MQ4 file)
hipfire run xfivetide/VibeThinker-3B-hipfire "Your prompt here"

# or pick a specific quant explicitly
hipfire run xfivetide/VibeThinker-3B-hipfire/vibethinker-3b.mq6.hfq "Your prompt here"

Performance (gfx1151 / Strix Halo, AMD RDNA 3.5)

Format	Size	Decode speed
BF16 safetensors (original)	5.8 GB	~8 tok/s
MQ4 (this repo)	1.8 GB	~36–90 tok/s
MQ6 (this repo)	2.4 GB	~25–60 tok/s

Speed varies with context length (attention cost grows with KV-cache size).

Quantization commands

hipfire-quantize \
  --input WeiboAI/VibeThinker-3B \
  --output vibethinker-3b.mq4.hfq \
  --format mq4 \
  --arch-id 7

hipfire-quantize \
  --input WeiboAI/VibeThinker-3B \
  --output vibethinker-3b.mq6.hfq \
  --format mq6 \
  --arch-id 7

Note: --arch-id 7 is required. The quantizer auto-detects qwen2 as id=1 (llama), which silently drops Q/K/V attention biases at load time.

Format details

Both formats use MagnumQuant (MQ) with FWHT rotation (Walsh-Hadamard Transform, group size 256). The FWHT equalizes weight magnitudes across groups before quantization, reducing per-group error vs plain INT4/INT6.

Norms, biases, and embeddings are stored as F16/Q8F16 (not quantized to 4/6-bit)
arch_id=7 routes to hipfire-arch-qwen2, which correctly loads Qwen2's Q/K/V attention biases

About the model

VibeThinker-3B is a 3B-parameter reasoning model fine-tuned on math, competitive programming, and STEM tasks. It generates <think>…</think> reasoning traces before answering. hipfire's CLI strips them by default; raw JSONL clients see the full stream.

Best for: competitive math, LeetCode-style coding, STEM reasoning
Not recommended for: tool-calling, agentic coding, open-domain knowledge
Recommended sampling: temperature 0.7–1.0, top_p 0.95

See WeiboAI/VibeThinker-3B for benchmarks and full details (MIT license).

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for xfivetide/VibeThinker-3B-hipfire

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

WeiboAI/VibeThinker-3B

Finetuned

(21)

this model