Image-Text-to-Text
MLX
Safetensors
English
Chinese
multilingual
qwen3_5_moe
mlx-lm
mlx-vlm
qwen3.6
conversational
vision
multimodal
uncensored
abliterated
heretic
4-bit precision
Instructions to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") config = load_config("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit
Run Hermes
hermes
| # Qwen 3.6 Chat Template | |
| A universally fixed Jinja chat template for Qwen 3.6 that serves as a drop-in upgrade for **all inference engines** (vLLM, llama.cpp, text-generation-webui, LM Studio, oMLX, etc). The official template continues to crash on C++ tool calls, struggles with the new `preserve_thinking` feature by spamming empty tags, is vulnerable to model hallucinations, and lacks a way to cleanly toggle thinking inline. This universal template handles all of that. | |
| ## What's broken in the official template | |
| 1. **Tool calls crash on C++ engines.** The official template uses Python's `|items` dictionary filter and `|safe`, neither of which exist in C++ Jinja runtimes (like those used by LM Studio or MLX). Any tool call triggers an out-of-bounds error. It also crashes if the arguments payload is returned as a raw string instead of an object. | |
| 2. **No `"developer"` role.** Modern APIs sometimes send `message.role == "developer"`. The official template raises an exception and dies. | |
| 3. **Empty `preserve_thinking` block spam.** Qwen 3.6 introduces a `preserve_thinking` kwarg. If toggled on, the official template wraps *every past turn* in a `<think></think>` block, which means a non-reasoning turn wastes context tokens with `<think>\n\n</think>`. | |
| 4. **The `</thinking>` hallucination.** The Qwen 3.6 LLM sometimes mistakenly generates `</thinking>` at the end of its reasoning block. The official parser expects strictly `</think>`, resulting in parsing failure and leaking `<thinking>` tokens into the chat. | |
| ## What this template does | |
| ### Universal tool arguments compatibility | |
| Replaced `|items` iteration with direct dictionary key lookups. Swapped `is sequence` for `is iterable` (which strict C++ runtimes require). Removed `|safe` wrappers and safely map raw JSON fallback schemas so that primitive parameters (like booleans) serialize precisely to JSON standard `true` instead of crashing environments by generating Python-flavored titlecase `"True"`. | |
| ### `"developer"` role support | |
| Intercepts `"developer"` messages and implicitly maps them to `"system"`. No crash, no data loss. | |
| ### Smarter `preserve_thinking` historical context | |
| **Now ON by default without any required kwargs!** Instead of mindlessly generating empty XML tags for past turns, this template checks if the historical context actually contains reasoning `(reasoning_content|trim|length > 0)`. Only then does it emit an active block into the chat cache, keeping context windows hyper-efficient. Furthermore, history is tied to the `<|think_off|>` override: disabling thinking in the prompt automatically sweeps older thinking blocks from the cache to drastically accelerate processing. | |
| ### `</thinking>` Hallucination handling | |
| During the assistant phase, the logic actively looks for boundary hallucinations. If Qwen generates `</thinking>`, this template dynamically splits on that literal instead of `</think>`, cleanly isolating tags seamlessly. If generation is interrupted mid-thought (max tokens/aborts) preventing a closing `</think>` tag from surfacing, the parser actively rescues the incomplete thought-stream instead of injecting invalid raw `<think>` pairs into the timeline. | |
| ### Thinking toggle from any message | |
| Drop `<|think_on|>` or `<|think_off|>` anywhere in a prompt. The template detects the tag, strips it iteratively without sequential state-bleeding so the model never sees it, and cascades the thinking state down to the generator prompt dynamically. | |
| ```text | |
| System: You are a coding assistant. <|think_off|> | |
| User: Check the weather in Paris. | |
| ``` | |
| The tag disappears. The model answers fast, generating `<think>\n\n</think>\n\n` natively. | |
| ```text | |
| System: You are a coding assistant. <|think_on|> | |
| User: Implement a red-black tree in Rust. | |
| ``` | |
| The model gets its `<think>\n` prompt and reasons deeply before answering. | |
| ## Comparison | |
| | Feature | Official | **This Fixed Template** | | |
| |---|---|---| | |
| | Tool arguments work | Crashes | **Fixed** | | |
| | `\|safe` removed | Crashes | **Fixed** | | |
| | `"developer"` role | Missing | **Added** | | |
| | Thinking toggle | None | **`<\|think_off\|>` anywhere** | | |
| | `preserve_thinking` | Spams empty blocks | **Dynamic length checks** | | |
| | Tag extraction | Fails on `</thinking>` | **Supports `</thinking>`** | | |
| ## Installation | |
| This template can be used anywhere standard HuggingFace Jinja templates are supported. | |
| ### General (vLLM, llama.cpp, TextGen) | |
| Simply replace your model's existing `chat_template` string in your `tokenizer_config.json` with the minified contents of this file, or load it as a custom template in your UI. | |
| ### LM Studio | |
| 1. Open LM Studio | |
| 2. Go to the **My Models** tab (or the right-side panel in Chat) | |
| 3. Select your Qwen 3.6 model | |
| 4. Scroll to **Prompt Template** | |
| 5. Delete the default template, paste this one in | |
| 6. Save | |
| ### oMLX | |
| 1. Unload any `chat_template_kwargs` arguments you may have forced. It is handled by the template actively. | |
| 2. Make sure you load the `--jinja` flag so the engine utilizes the custom parsing rules. | |
| 3. Overwrite the `chat_template.jinja` source file locally. | |