File size: 4,728 Bytes
7a71962
 
 
 
 
9935c1b
7a71962
 
 
9935c1b
7a71962
 
 
 
 
 
 
 
 
 
9935c1b
7a71962
 
9935c1b
7a71962
9935c1b
 
5fb6cee
9935c1b
7a71962
9935c1b
7a71962
9935c1b
 
 
9a33cd5
9935c1b
 
 
7a71962
9935c1b
 
 
7a71962
9935c1b
9a33cd5
9935c1b
9a33cd5
9935c1b
9a33cd5
9935c1b
 
 
 
 
 
9a33cd5
9935c1b
9a33cd5
 
 
9935c1b
 
 
 
 
 
 
 
9a33cd5
9935c1b
 
9a33cd5
 
9935c1b
9a33cd5
9935c1b
 
 
 
 
9a33cd5
 
9935c1b
9a33cd5
9935c1b
9a33cd5
9935c1b
 
 
 
 
9a33cd5
 
9935c1b
 
 
 
 
9a33cd5
 
9935c1b
9a33cd5
9935c1b
 
 
 
 
 
9a33cd5
9935c1b
 
 
7a71962
9935c1b
9a33cd5
9935c1b
 
 
 
 
 
 
 
7a71962
9935c1b
 
 
 
 
 
 
 
9a33cd5
 
9935c1b
9a33cd5
9935c1b
 
 
 
 
 
7a71962
 
9935c1b
 
 
 
 
7a71962
9935c1b
7a71962
9935c1b
 
 
 
 
 
 
7a71962
 
 
9935c1b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: gemma
library_name: gguf
base_model: google/gemma-4-26B-A4B-it
base_model_relation: quantized
model_name: Gemma-4-26B-A4B-it-Cerebellum-v6.1-templatefix-GGUF
model_creator: google
model_type: gemma4
quantized_by: deucebucket
pipeline_tag: text-generation
tags:
  - GGUF
  - gemma4
  - gemma
  - google
  - quantized
  - cerebellum
  - imatrix
  - moe
  - 3-bit
  - templatefix
---

# Gemma 4 26B-A4B-it Cerebellum GGUF

This repository contains GGUF builds derived from
`google/gemma-4-26B-A4B-it`.

## 2026-05-22 Update

Added:

```text
gemma-4-26B-A4B-it-cerebellum-v6.1-templatefix.gguf
sha256: d24229facdef8360a7ffa8b37a50e1de636b9139a5eba0efe899828e45ae7989

gemma-4-26b-a4b-it.mmproj.gguf
sha256: b762c43119ebdc3e3c36d929d958e827fac35b03278dda9203f87131aee1f185
```

The v6.1 file keeps the v6 tensor allocation and updates GGUF/runtime-facing
metadata for Gemma 4 chat-template use. The update was tested with
`llama-server --jinja --reasoning auto` and request-level no-thinking controls.

Older files in this repository are retained for reproducibility.

## Tested Runtime

Runtime used for the 2026-05-22 templatefix checks:

```text
llama.cpp fork: https://github.com/deucebucket/llama.cpp
branch: cerebellum/gemma4-runtime-fixes
fork commit: ded491334 fix: harden Gemma 4 server budgets
base build: b8930-59fa0b455
```

Server shape used locally:

```bash
llama-server \
  --model gemma-4-26B-A4B-it-cerebellum-v6.1-templatefix.gguf \
  --mmproj gemma-4-26b-a4b-it.mmproj.gguf \
  --n-gpu-layers 99 \
  --ctx-size 65536 \
  --parallel 1 \
  --flash-attn on \
  --cache-type-k q8_0 \
  --cache-type-v q8_0 \
  --jinja \
  --reasoning auto \
  --media-path /tmp/
```

Normal no-thinking requests used:

```json
{
  "chat_template_kwargs": {"enable_thinking": false},
  "thinking_budget_tokens": 0
}
```

Bounded-thinking smoke requests used `thinking_budget_tokens: 128`.

## 2026-05-22 Templatefix Test Artifacts

Creative-writing smoke files:

```text
creative_eval_20260522/regular_v6_1_templatefix_creative_summary.json
creative_eval_20260522/regular_v6_1_templatefix_creative_rerun_longcaps_summary.json
```

Non-coding tool-use files:

```text
agentic_eval_20260522/README.md
agentic_eval_20260522/regular_v6_1_noncoding_agentic_tools_strict_summary.json
```

Observed 2026-05-22 results from those artifacts:

| Area | Harness | Observed result |
|---|---|---|
| No-thinking output channel | six creative prompts | `reasoning_len=0` in recorded outputs |
| Template leakage markers | six creative prompts | no `<think>` marker or template marker recorded by checker |
| Creative long-cap rerun | four prompts rerun after initial length caps | four stop finishes in rerun summary |
| Non-coding tool workflow | three strict OpenAI-style tool tasks | `schedule_strict`, `release_notes_strict`, `creative_brief_strict` listed in `pass_cases` |

The non-coding tool harness used mock tools named `list_calendar`,
`create_calendar_hold`, `search_notes`, `save_note`, and `add_task`. It did not
test code editing.

## Historical Same-Repo Benchmark Artifacts

The following benchmark artifacts are from the earlier v6 line and the local
Q3_K_M baseline. They are included as historical same-project measurements, not
as new v6.1 measurements.

| Artifact set | ARC-Challenge | HellaSwag | MMLU-Redux | HumanEval note |
|---|---:|---:|---:|---|
| `q3km_baseline_*` | 95.2218 | 86.5664 | 73.6667 | `q3km_baseline_humaneval_results.json`: 62.2 pass@1 |
| `cerebellum_v6_*` | 95.5631 | 84.55 | 71.3333 | v6 HumanEval artifacts are retained but marked for audit in local notes |

For Gemma 4 HumanEval/EvalPlus, the local protocol now uses chat completions,
not raw completions:

```text
llama-server --jinja --reasoning auto
chat_template_kwargs: {"enable_thinking": false}
thinking_budget_tokens: 0
BENCH_WORKERS=1
```

## Files and Provenance

Main v6.1 GGUF:

```text
source base: google/gemma-4-26B-A4B-it
quantization family: mixed-precision GGUF
recipe lineage: Cerebellum v6 tensor allocation
```

Matching mmproj:

```text
gemma-4-26b-a4b-it.mmproj.gguf
```

## Notes

- The 2026-05-22 tests were run on local `llama-server`.
- The opencode coding-agent test is not used as a model-card result. In one
  internal White and Black project run, the model connected through the harness
  and ran a Godot test, then produced malformed edit-tool calls.
- The creative-writing checks are smoke tests plus mechanical checks, not a
  human preference benchmark.
- The non-coding tool checks use mocked tools and fixed task definitions.

## Credits

- Base model: Google Gemma Team, `google/gemma-4-26B-A4B-it`
- GGUF/runtime: llama.cpp
- Quantization and local test artifacts: deucebucket Cerebellum workflow