VibeThinker-3B — hipfire quantized (MQ4 + MQ6)

FWHT-rotated quantizations of WeiboAI/VibeThinker-3B for hipfire, a Rust-native inference engine for AMD RDNA GPUs.

Files

File Format Size Quality
vibethinker-3b.mq4.hfq MQ4G256 (4-bit FWHT) 1.8 GB good
vibethinker-3b.mq6.hfq MQ6G256 (6-bit FWHT) 2.4 GB better

MQ4 is the default pick — fastest decode, smallest size.
MQ6 is worth it if you have extra VRAM budget and want tighter quality.

Usage

# pull and run (hipfire auto-selects the MQ4 file)
hipfire run xfivetide/VibeThinker-3B-hipfire "Your prompt here"

# or pick a specific quant explicitly
hipfire run xfivetide/VibeThinker-3B-hipfire/vibethinker-3b.mq6.hfq "Your prompt here"

Performance (gfx1151 / Strix Halo, AMD RDNA 3.5)

Format Size Decode speed
BF16 safetensors (original) 5.8 GB ~8 tok/s
MQ4 (this repo) 1.8 GB ~36–90 tok/s
MQ6 (this repo) 2.4 GB ~25–60 tok/s

Speed varies with context length (attention cost grows with KV-cache size).

Quantization commands

hipfire-quantize \
  --input WeiboAI/VibeThinker-3B \
  --output vibethinker-3b.mq4.hfq \
  --format mq4 \
  --arch-id 7

hipfire-quantize \
  --input WeiboAI/VibeThinker-3B \
  --output vibethinker-3b.mq6.hfq \
  --format mq6 \
  --arch-id 7

Note: --arch-id 7 is required. The quantizer auto-detects qwen2 as id=1 (llama), which silently drops Q/K/V attention biases at load time.

Format details

Both formats use MagnumQuant (MQ) with FWHT rotation (Walsh-Hadamard Transform, group size 256). The FWHT equalizes weight magnitudes across groups before quantization, reducing per-group error vs plain INT4/INT6.

  • Norms, biases, and embeddings are stored as F16/Q8F16 (not quantized to 4/6-bit)
  • arch_id=7 routes to hipfire-arch-qwen2, which correctly loads Qwen2's Q/K/V attention biases

About the model

VibeThinker-3B is a 3B-parameter reasoning model fine-tuned on math, competitive programming, and STEM tasks. It generates <think>…</think> reasoning traces before answering. hipfire's CLI strips them by default; raw JSONL clients see the full stream.

  • Best for: competitive math, LeetCode-style coding, STEM reasoning
  • Not recommended for: tool-calling, agentic coding, open-domain knowledge
  • Recommended sampling: temperature 0.7–1.0, top_p 0.95

See WeiboAI/VibeThinker-3B for benchmarks and full details (MIT license).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xfivetide/VibeThinker-3B-hipfire

Base model

Qwen/Qwen2.5-3B
Finetuned
(21)
this model