Issue with tool calling + template

#2
by omshunyaom13 - opened

Hi... on linux...here is quick preview of my compose.yml

(command:
- --model
- /models/Gemma4-agentic-Q8_0/gemma4-agentic-Q8_0.gguf
- --host
- 0.0.0.0
- --port
- "8051"
- --n-gpu-layers
- "48"
- --ctx-size
- "262144"
- --parallel
- "2"
- --cache-type-k
- "q8_0"
- --cache-type-v
- "q8_0"
- --cont-batching
- --flash-attn
- "on"
- --reasoning
- "on"
- --chat-template
- "jinja"
- --override-kv
- "chat_template=string:jinja"
- --repeat-penalty
- "1.1"
- --temp
- ".7"
- --top-k
- "50"
- --top-p
- "0.85")

Logs:

0.04.765.138 I init: chat template, example_format: '<|turn>system
<|think|>
You are a helpful assistant<turn|>
<|turn>user
Hello<turn|>
<|turn>model
Hi there<turn|>
<|turn>user
How are you?<turn|>
<|turn>model
'
0.04.766.280 I srv init: init: chat template, thinking = 1
2.58.189.055 I srv operator(): Chat format: peg-gemma4

om@BlackBox-Labs:/mnt/docker-ssd/lm-endpoint$ docker inspect lm-gemma4 --format '{{.Path}} {{.Args}}'
docker inspect lm-gemma4 --format '{{.Path}} {{.Args}}'
llama-server [--model /models/Gemma4-agentic-Q8_0/gemma4-agentic-Q8_0.gguf --host 0.0.0.0 --port 8051 --n-gpu-layers 48 --ctx-size 262144 --parallel 2 --cache-type-k q8_0 --cache-type-v q8_0 --cont-batching --flash-attn on --reasoning on --jinja --repeat-penalty 1.1 --temp .7 --top-k 50 --top-p 0.85]

โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โ— Simple test , only list files in current directory
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

โ•ญโ”€ โš• Hermes โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
The command is ls -l with no arguments โ€” it lists all long format entries from /home/om and outputs to stdout. I will run this using python3 and execute the file listing as requested by the code execution tool,

python
import os
file_list = os.listdir(os.getcwd())
return (f"Files in current directory: {', '.join(file_lists)}") # typo fix from my previous thought note to correct 'file_lists'



[tool call]
omshunyaom13 changed discussion title from Issue with template to Issue with tool calling + template

Hey @omshunyaom13 โ€” thanks for the detailed setup info, that helps a lot.

What's happening: The model is generating Python-style tool calls instead of using Hermes' native tool format. When you asked it to list files, it wrote Python code (os.listdir) instead of using the bash tool directly. It also introduced a typo (file_lists vs file_list) which suggests the model is trying to "code" the solution rather than delegate to tools.

The issue is the chat template + tool format interaction. Gemma 4's peg-gemma4 template handles tool calls differently than what Hermes expects. Two fixes:

Fix 1 โ€” Use llama.cpp's --jinja flag with the correct template:

--jinja
--chat-template-file /path/to/hermes-tool-template.jinja

The default Gemma 4 template doesn't have native tool-calling format built in. You need to either use a custom Jinja template that maps to Hermes' tool format, or use --chat-template chatml which Hermes understands natively.

Fix 2 โ€” Try chatml template instead of jinja:

--chat-template chatml

ChatML is what Hermes was designed for. The Gemma 4 peg-gemma4 template doesn't emit <tool_call> tokens in the format Hermes expects.

Fix 3 โ€” Add a system prompt that explicitly tells the model how to format tool calls:

When using tools, output in this exact format:

{"name": "bash", "arguments": {"command": "ls -l"}}

Also โ€” I noticed you're running with --ctx-size 262144 (256K context). That's going to use a LOT of VRAM even with q8_0 KV cache. If you're hitting memory issues, try --ctx-size 8192 first to verify tool calling works, then scale up.

Let me know which fix works for your setup!

โ€” Gabriel Garcia / RavenX AI Labs LLC

Sign up or log in to comment