Instructions to use google/gemma-4-12B-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-4-12B-it with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("google/gemma-4-12B-it") model = AutoModelForMultimodalLM.from_pretrained("google/gemma-4-12B-it") - Notebooks
- Google Colab
- Kaggle
fix: restore model turn + thinking cue after tool responses
Browse filesThe generation prompt suppresses <|turn>model after tool responses, which prevents the model from re-entering its thinking state in multi-turn tool-calling flows.
Fix:
- Remove 'tool_response' from the suppression condition so the model gets its turn marker after tool responses.
- Inject <|channel>thought when enable_thinking=true AND the previous message was a tool_response, nudging the model back into its reasoning state machine.
Normal (non-tool) turns are unchanged — the model self-generates the thinking channel cue as usual.
Ref: https://github.com/vllm-project/vllm/issues/45039
- chat_template.jinja +4 -1
chat_template.jinja
CHANGED
|
@@ -374,8 +374,11 @@
|
|
| 374 |
{%- endfor -%}
|
| 375 |
|
| 376 |
{%- if add_generation_prompt -%}
|
| 377 |
-
{%- if ns.prev_message_type != '
|
| 378 |
{{- '<|turn>model\n' -}}
|
|
|
|
|
|
|
|
|
|
| 379 |
{%- endif -%}
|
| 380 |
|
| 381 |
{%- if not enable_thinking -%}
|
|
|
|
| 374 |
{%- endfor -%}
|
| 375 |
|
| 376 |
{%- if add_generation_prompt -%}
|
| 377 |
+
{%- if ns.prev_message_type != 'tool_call' -%}
|
| 378 |
{{- '<|turn>model\n' -}}
|
| 379 |
+
{%- if enable_thinking and ns.prev_message_type == 'tool_response' -%}
|
| 380 |
+
{{- '<|channel>thought\n' -}}
|
| 381 |
+
{%- endif -%}
|
| 382 |
{%- endif -%}
|
| 383 |
|
| 384 |
{%- if not enable_thinking -%}
|