Image-Text-to-Text
MLX
Safetensors
English
Chinese
multilingual
qwen3_5_moe
mlx-lm
mlx-vlm
qwen3.6
conversational
vision
multimodal
uncensored
abliterated
heretic
4-bit precision
Instructions to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") config = load_config("froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default froggeric/Qwen3.6-35B-A3B-Uncensored-Heretic-MLX-4bit
Run Hermes
hermes
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -170,6 +170,26 @@ This approach was submitted as a pull request to Heretic but was not merged —
|
|
| 170 |
|
| 171 |
---
|
| 172 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 173 |
## Sampling
|
| 174 |
|
| 175 |
From the official Qwen authors. Reserve 128K+ context for thinking mode.
|
|
|
|
| 170 |
|
| 171 |
---
|
| 172 |
|
| 173 |
+
## How it compares
|
| 174 |
+
|
| 175 |
+
### Community results
|
| 176 |
+
|
| 177 |
+
r/LocalLLaMA users have been A/B-testing various uncensored Qwen 3.6 variants — [Heretic](https://github.com/p-e-w/heretic), HauhauCS Aggressive, abliterix, and simple orthogonal projection. The pattern is consistent: **Heretic produces the best balance of refusal removal and output quality**.
|
| 178 |
+
|
| 179 |
+
[Community discussion →](https://www.reddit.com/r/LocalLLaMA/comments/1sw5fb7/qwen36_35b_a3b_heretic_kld_00015_incredible_model/)
|
| 180 |
+
|
| 181 |
+
### Why
|
| 182 |
+
|
| 183 |
+
Most abliteration methods treat all layers identically. Qwen 3.6's hybrid attention (3:1 linear-to-softmax ratio) means a single parameter set either under-abliterate the DeltaNet blocks or over-abliterate the softmax blocks. Architecture-aware abliteration — separate parameters per attention type — is the key differentiator.
|
| 184 |
+
|
| 185 |
+
### A note on SSM conv1d "repair"
|
| 186 |
+
|
| 187 |
+
Some uncensored variants apply a pre-processing step that rescales SSM conv1d weights before abliteration, claiming to fix "outlier" tensors in the DeltaNet linear attention layers. This technique (originating as "Sig-ScaleSync") was benchmarked with **284 data points** across perplexity, needle-in-a-haystack, and repetition tests at multiple context lengths (4K–128K). Result: **perplexity degraded at every length with no improvement** in NIAH or repetition. The unrepaired original weights perform best.
|
| 188 |
+
|
| 189 |
+
Abliterating a degraded baseline can yield a lower measured KL divergence — but that measures distance from a worse starting point, not better preservation of the original model's capabilities.
|
| 190 |
+
|
| 191 |
+
---
|
| 192 |
+
|
| 193 |
## Sampling
|
| 194 |
|
| 195 |
From the official Qwen authors. Reserve 128K+ context for thinking mode.
|