Add files using upload-large-folder tool

49dc750 verified about 2 months ago

5.07 kB

	# Qwen 3.6 Chat Template

	A universally fixed Jinja chat template for Qwen 3.6 that serves as a drop-in upgrade for all inference engines (vLLM, llama.cpp, text-generation-webui, LM Studio, oMLX, etc). The official template continues to crash on C++ tool calls, struggles with the new `preserve_thinking` feature by spamming empty tags, is vulnerable to model hallucinations, and lacks a way to cleanly toggle thinking inline. This universal template handles all of that.

	## What's broken in the official template

	1. Tool calls crash on C++ engines. The official template uses Python's `\|items` dictionary filter and `\|safe`, neither of which exist in C++ Jinja runtimes (like those used by LM Studio or MLX). Any tool call triggers an out-of-bounds error. It also crashes if the arguments payload is returned as a raw string instead of an object.
	2. No `"developer"` role. Modern APIs sometimes send `message.role == "developer"`. The official template raises an exception and dies.
	3. Empty `preserve_thinking` block spam. Qwen 3.6 introduces a `preserve_thinking` kwarg. If toggled on, the official template wraps every past turn in a `<think></think>` block, which means a non-reasoning turn wastes context tokens with `<think>\n\n</think>`.
	4. The `</thinking>` hallucination. The Qwen 3.6 LLM sometimes mistakenly generates `</thinking>` at the end of its reasoning block. The official parser expects strictly `</think>`, resulting in parsing failure and leaking `<thinking>` tokens into the chat.

	## What this template does

	### Universal tool arguments compatibility

	Replaced `\|items` iteration with direct dictionary key lookups. Swapped `is sequence` for `is iterable` (which strict C++ runtimes require). Removed `\|safe` wrappers and safely map raw JSON fallback schemas so that primitive parameters (like booleans) serialize precisely to JSON standard `true` instead of crashing environments by generating Python-flavored titlecase `"True"`.

	### `"developer"` role support

	Intercepts `"developer"` messages and implicitly maps them to `"system"`. No crash, no data loss.

	### Smarter `preserve_thinking` historical context

	Now ON by default without any required kwargs! Instead of mindlessly generating empty XML tags for past turns, this template checks if the historical context actually contains reasoning `(reasoning_content\|trim\|length > 0)`. Only then does it emit an active block into the chat cache, keeping context windows hyper-efficient. Furthermore, history is tied to the `<\|think_off\|>` override: disabling thinking in the prompt automatically sweeps older thinking blocks from the cache to drastically accelerate processing.

	### `</thinking>` Hallucination handling

	During the assistant phase, the logic actively looks for boundary hallucinations. If Qwen generates `</thinking>`, this template dynamically splits on that literal instead of `</think>`, cleanly isolating tags seamlessly. If generation is interrupted mid-thought (max tokens/aborts) preventing a closing `</think>` tag from surfacing, the parser actively rescues the incomplete thought-stream instead of injecting invalid raw `<think>` pairs into the timeline.

	### Thinking toggle from any message

	Drop `<\|think_on\|>` or `<\|think_off\|>` anywhere in a prompt. The template detects the tag, strips it iteratively without sequential state-bleeding so the model never sees it, and cascades the thinking state down to the generator prompt dynamically.

	```text
	System: You are a coding assistant. <\|think_off\|>
	User: Check the weather in Paris.
	```

	The tag disappears. The model answers fast, generating `<think>\n\n</think>\n\n` natively.

	```text
	System: You are a coding assistant. <\|think_on\|>
	User: Implement a red-black tree in Rust.
	```

	The model gets its `<think>\n` prompt and reasons deeply before answering.

	## Comparison

	\| Feature \| Official \| This Fixed Template \|
	\|---\|---\|---\|
	\| Tool arguments work \| Crashes \| Fixed \|
	\| `\\|safe` removed \| Crashes \| Fixed \|
	\| `"developer"` role \| Missing \| Added \|
	\| Thinking toggle \| None \| `<\\|think_off\\|>` anywhere \|
	\| `preserve_thinking` \| Spams empty blocks \| Dynamic length checks \|
	\| Tag extraction \| Fails on `</thinking>` \| Supports `</thinking>` \|

	## Installation

	This template can be used anywhere standard HuggingFace Jinja templates are supported.

	### General (vLLM, llama.cpp, TextGen)
	Simply replace your model's existing `chat_template` string in your `tokenizer_config.json` with the minified contents of this file, or load it as a custom template in your UI.

	### LM Studio
	1. Open LM Studio
	2. Go to the My Models tab (or the right-side panel in Chat)
	3. Select your Qwen 3.6 model
	4. Scroll to Prompt Template
	5. Delete the default template, paste this one in
	6. Save

	### oMLX
	1. Unload any `chat_template_kwargs` arguments you may have forced. It is handled by the template actively.
	2. Make sure you load the `--jinja` flag so the engine utilizes the custom parsing rules.
	3. Overwrite the `chat_template.jinja` source file locally.