fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

#35

Summary

Improves Gemma4 chat template:

Bug fixes

  • None values now render as null instead of Python's None
  • String-typed tool_calls[].function.arguments now raises a clear error instead of silently producing malformed DSL
  • Prior-turn reasoning/thinking is preserved across multi-turn tool-call chains (preserve_thinking flag, default=true)
  • Consecutive assistant messages now produce balanced <|turn>model/<turn|> tags via forward-scan continuation detection

Improvements

  • enable_thinking normalized once with | default(false), eliminating repetitive is defined and checks
  • image_url and input_audio content types now map to <|image|> and <|audio|> (OpenAI compatibility)
  • Empty messages=[] handled gracefully instead of crashing
  • Unmatched tool_call_id in tool responses falls back to 'unknown' instead of crashing
  • Consistent .get() access prevents StrictUndefined errors for optional message keys
  • O(1) backward scan for model-turn continuation (was O(n) per message)

This template doesn't preserve thinking correctly yet.

> think of two 5-digit numbers and tell me the first one

[Start thinking]
*   Input: "think of two 5-digit numbers and tell me the first one"
    *   Task: Generate two 5-digit numbers mentally and output only the first one.
    *   Constraint: Numbers must be 5 digits long (10,000 to 99,999).

    *   Number 1: 47,392
    *   Number 2: 81,056

    *   The user asked for only the first one.

    *   Output: 47,392
[End thinking]

47,392

[ Prompt: 131.2 t/s | Generation: 50.5 t/s ]

> what's the second number?

[Start thinking]
*   The user previously asked me to "think of two 5-digit numbers and tell me the first one."
    *   I provided: 47,392.
    *   The user is now asking: "what's the second number?"

    *   I need to provide another 5-digit number that I "thought" of in the previous step.
    *   Since I don't have a persistent "memory" of a specific hidden number unless I generated it internally, I should pick a random 5-digit number now.
    *   However, to make it feel consistent, I should just provide a 5-digit number.

    *   Random 5-digit number: 82,105.

    *   "The second number is 82,105."
[End thinking]

The second number is 82,105.

[ Prompt: 640.9 t/s | Generation: 51.6 t/s ]

used:

~/AI/inference/llama-b9568/build/bin/llama-cli \
  -m ~/AI/models/unsloth/gemma-4-12b-it-UD-Q8_K_XL.gguf \
  --chat-template-file ~/AI/models/chat_template.jinja --chat-template-kwargs "{\"preserve_thinking\": true}"

Hi @uniment - it is expected. We don't store thinking traces/thoughts during text only conversations by design. They are manually stripped out of the conversation.

This PR is only bringing the thoughts to tools calls/answers to keep the context of the tools usage to the model and avoid regressions on long exchanges.

Good to know, thanks.

Lucianommartins

Thanks, mate

you fixed my zombie loop!

You deserve a medal.

Hi @uniment - it is expected. We don't store thinking traces/thoughts during text only conversations by design. They are manually stripped out of the conversation.

This PR is only bringing the thoughts to tools calls/answers to keep the context of the tools usage to the model and avoid regressions on long exchanges.

I'm curious why its not stored for text only conversations? everything i've read says this is the primary time you want it. or is this expected to be handled by the inference server/rag database? sorry if this seems like a dumb question, but i'd like to understand this, and, well, gemini says you're wrong lol

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment