Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
fix: revert add_generation_prompt regression + preserve_thinking default
Browse filesTwo fixes based on reviewer feedback from vllm-project/vllm#45553:
1. Restore original add_generation_prompt guard: suppress <|turn>model
when prev_message_type is 'tool_response' or 'tool_call'. The model
continues the same turn after tool responses — a new <|turn>model
breaks multi-step tool chains (assistant->tool->assistant->tool).
Keep the <|channel>thought\n cue for thinking-enabled after tool
responses.
2. Change preserve_thinking default from true to false. Per Gemma4 docs
(https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
#managing-thought-context): 'You must remove (strip) the model's
generated thoughts from the previous turn.' Thoughts within a
single tool call chain are still preserved (they're after
last_user_idx).
Ref: https://github.com/vllm-project/vllm/pull/45553
Ref: https://github.com/vllm-project/vllm/pull/42776
- chat_template.jinja +5 -10
|
@@ -178,7 +178,7 @@
|
|
| 178 |
{%- set ns = namespace(prev_message_type=None, prev_non_tool_role=None) -%}
|
| 179 |
{%- set loop_messages = messages -%}
|
| 180 |
{%- set enable_thinking = enable_thinking | default(false) -%}
|
| 181 |
-
{%- set preserve_thinking = preserve_thinking | default(
|
| 182 |
{{- bos_token -}}
|
| 183 |
{#- Handle System/Tool Definitions Block -#}
|
| 184 |
{%- if enable_thinking or tools or (messages and messages[0]['role'] in ['system', 'developer']) -%}
|
|
@@ -376,17 +376,12 @@
|
|
| 376 |
{%- endfor -%}
|
| 377 |
|
| 378 |
{%- if add_generation_prompt -%}
|
| 379 |
-
{%- if ns.prev_message_type != 'tool_call' -%}
|
| 380 |
{{- '<|turn>model\n' -}}
|
| 381 |
-
{%- if enable_thinking
|
| 382 |
-
{{- '<|channel>thought\n' -}}
|
| 383 |
-
{%- endif -%}
|
| 384 |
-
{%- endif -%}
|
| 385 |
-
|
| 386 |
-
{%- if not enable_thinking -%}
|
| 387 |
-
{#- Suppress thinking — but not when awaiting tool responses -#}
|
| 388 |
-
{%- if ns.prev_message_type != 'tool_call' -%}
|
| 389 |
{{- '<|channel>thought\n<channel|>' -}}
|
| 390 |
{%- endif -%}
|
|
|
|
|
|
|
| 391 |
{%- endif -%}
|
| 392 |
{%- endif -%}
|
|
|
|
| 178 |
{%- set ns = namespace(prev_message_type=None, prev_non_tool_role=None) -%}
|
| 179 |
{%- set loop_messages = messages -%}
|
| 180 |
{%- set enable_thinking = enable_thinking | default(false) -%}
|
| 181 |
+
{%- set preserve_thinking = preserve_thinking | default(false) -%}
|
| 182 |
{{- bos_token -}}
|
| 183 |
{#- Handle System/Tool Definitions Block -#}
|
| 184 |
{%- if enable_thinking or tools or (messages and messages[0]['role'] in ['system', 'developer']) -%}
|
|
|
|
| 376 |
{%- endfor -%}
|
| 377 |
|
| 378 |
{%- if add_generation_prompt -%}
|
| 379 |
+
{%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
|
| 380 |
{{- '<|turn>model\n' -}}
|
| 381 |
+
{%- if not enable_thinking -%}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 382 |
{{- '<|channel>thought\n<channel|>' -}}
|
| 383 |
{%- endif -%}
|
| 384 |
+
{%- elif ns.prev_message_type == 'tool_response' and enable_thinking -%}
|
| 385 |
+
{{- '<|channel>thought\n' -}}
|
| 386 |
{%- endif -%}
|
| 387 |
{%- endif -%}
|