fix: revert add_generation_prompt regression + preserve_thinking default

Two fixes based on reviewer feedback from vllm-project/vllm#45553:

1. Restore original add_generation_prompt guard: suppress <|turn>model
when prev_message_type is 'tool_response' or 'tool_call'. The model
continues the same turn after tool responses — a new <|turn>model
breaks multi-step tool chains (assistant->tool->assistant->tool).
Keep the <|channel>thought\n cue for thinking-enabled after tool
responses.

2. Change preserve_thinking default from true to false. Per Gemma4 docs
(https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
#managing-thought-context): 'You must remove (strip) the model's
generated thoughts from the previous turn.' Thoughts within a
single tool call chain are still preserved (they're after
last_user_idx).

Ref: https://github.com/vllm-project/vllm/pull/45553
Ref: https://github.com/vllm-project/vllm/pull/42776

Files changed (1) hide show

chat_template.jinja +5 -10

chat_template.jinja CHANGED Viewed

@@ -178,7 +178,7 @@
 {%- set ns = namespace(prev_message_type=None, prev_non_tool_role=None) -%}
 {%- set loop_messages = messages -%}
 {%- set enable_thinking = enable_thinking | default(false) -%}
-{%- set preserve_thinking = preserve_thinking | default(true) -%}
 {{- bos_token -}}
 {#- Handle System/Tool Definitions Block -#}
 {%- if enable_thinking or tools or (messages and messages[0]['role'] in ['system', 'developer']) -%}
@@ -376,17 +376,12 @@
 {%- endfor -%}
 {%- if add_generation_prompt -%}
-    {%- if ns.prev_message_type != 'tool_call' -%}
         {{- '<|turn>model\n' -}}
-        {%- if enable_thinking and ns.prev_message_type == 'tool_response' -%}
-            {{- '<|channel>thought\n' -}}
-        {%- endif -%}
-    {%- endif -%}
-    {%- if not enable_thinking -%}
-        {#- Suppress thinking — but not when awaiting tool responses -#}
-        {%- if ns.prev_message_type != 'tool_call' -%}
             {{- '<|channel>thought\n<channel|>' -}}
         {%- endif -%}
     {%- endif -%}
 {%- endif -%}

 {%- set ns = namespace(prev_message_type=None, prev_non_tool_role=None) -%}
 {%- set loop_messages = messages -%}
 {%- set enable_thinking = enable_thinking | default(false) -%}
+{%- set preserve_thinking = preserve_thinking | default(false) -%}
 {{- bos_token -}}
 {#- Handle System/Tool Definitions Block -#}
 {%- if enable_thinking or tools or (messages and messages[0]['role'] in ['system', 'developer']) -%}
 {%- endfor -%}
 {%- if add_generation_prompt -%}
+    {%- if ns.prev_message_type != 'tool_response' and ns.prev_message_type != 'tool_call' -%}
         {{- '<|turn>model\n' -}}
+        {%- if not enable_thinking -%}
             {{- '<|channel>thought\n<channel|>' -}}
         {%- endif -%}
+    {%- elif ns.prev_message_type == 'tool_response' and enable_thinking -%}
+        {{- '<|channel>thought\n' -}}
     {%- endif -%}
 {%- endif -%}