Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
fix: emit empty thought-channel primer on historical assistant turns for APC
Browse filesWhen enable_thinking=false, the generation prompt inserts an empty
<|channel>thought\n<channel|> block after <|turn>model\n to suppress
thinking. But historical assistant turns in multi-turn replay did not
emit this same block. This caused the KV cache from turn N to diverge
from the prompt for turn N+1, breaking vLLM's automatic prefix caching
(APC) — every multi-turn continuation was a cache miss.
Fix: insert the empty thought-channel primer on historical model turns
when enable_thinking is false and the message has no reasoning content.
Guarded by the existing continue_same_model_turn check to avoid
duplicates in assistant->tool->assistant continuations.
- chat_template.jinja +3 -0
chat_template.jinja
CHANGED
|
@@ -226,6 +226,9 @@
|
|
| 226 |
{%- set continue_same_model_turn = (role == 'model' and ns.prev_non_tool_role == 'assistant') -%}
|
| 227 |
{%- if not continue_same_model_turn -%}
|
| 228 |
{{- '<|turn>' + role + '\n' }}
|
|
|
|
|
|
|
|
|
|
| 229 |
{%- endif -%}
|
| 230 |
|
| 231 |
{#- Render reasoning/reasoning_content as thinking channel (tool-call turns only) -#}
|
|
|
|
| 226 |
{%- set continue_same_model_turn = (role == 'model' and ns.prev_non_tool_role == 'assistant') -%}
|
| 227 |
{%- if not continue_same_model_turn -%}
|
| 228 |
{{- '<|turn>' + role + '\n' }}
|
| 229 |
+
{%- if role == 'model' and not enable_thinking and not (message.get('reasoning') or message.get('reasoning_content')) -%}
|
| 230 |
+
{{- '<|channel>thought\n<channel|>' -}}
|
| 231 |
+
{%- endif -%}
|
| 232 |
{%- endif -%}
|
| 233 |
|
| 234 |
{#- Render reasoning/reasoning_content as thinking channel (tool-call turns only) -#}
|