RefinedNeuro commited on
Commit
0e97db8
·
verified ·
1 Parent(s): 4aadda4

Update card: keep Q6_K/Q8_0/f16 only

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -31,17 +31,20 @@ benchmarks.
31
 
32
  | File | Quant | Size |
33
  |---|---|---|
34
- | `VibeThinker-3B-Hermes-Q3_K_M.gguf` | Q3_K_M | 1.5 GB |
35
- | `VibeThinker-3B-Hermes-Q4_K_M.gguf` | Q4_K_M (recommended) | 1.8 GB |
36
- | `VibeThinker-3B-Hermes-Q5_K_M.gguf` | Q5_K_M | 2.1 GB |
37
- | `VibeThinker-3B-Hermes-Q6_K.gguf` | Q6_K | 2.4 GB |
38
- | `VibeThinker-3B-Hermes-Q8_0.gguf` | Q8_0 | 3.1 GB |
39
- | `VibeThinker-3B-Hermes-f16.gguf` | F16 | 5.8 GB |
 
 
40
 
41
  ## Usage (llama.cpp)
42
 
43
  ```bash
44
- llama-cli -m VibeThinker-3B-Hermes-Q4_K_M.gguf \
 
45
  --temp 0.6 --top-p 0.95 --repeat-penalty 1.1 \
46
  -p "<your Hermes-formatted prompt>"
47
  ```
 
31
 
32
  | File | Quant | Size |
33
  |---|---|---|
34
+ | `-Q6_K.gguf` | **Q6_K** | 2.4 GB | min recommended |
35
+ | `-Q8_0.gguf` | Q8_0 | 3.1 GB |
36
+ | `-f16.gguf` | F16 | 5.8 GB | best |
37
+
38
+ > **Why only Q6_K and up?** We measured tool-call fidelity on a multi-step agentic task:
39
+ > **Q6_K, Q8_0 and F16 pass; Q3/Q4/Q5 fail** (they emit malformed/incomplete tool calls and
40
+ > can loop). Since this is a tool-calling model, the lower quants were removed to avoid
41
+ > shipping a broken experience. Use **Q6_K** for the best size/quality balance.
42
 
43
  ## Usage (llama.cpp)
44
 
45
  ```bash
46
+ # use Q6_K+ for tool-calling
47
+ llama-cli -m VibeThinker-3B-Hermes-Q6_K.gguf \
48
  --temp 0.6 --top-p 0.95 --repeat-penalty 1.1 \
49
  -p "<your Hermes-formatted prompt>"
50
  ```