Instructions to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx
Run Hermes
hermes
- MLX LM
How to use osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx", "messages": [ {"role": "user", "content": "Hello"} ] }'
Configure the model in Pi
# Install Pi:
npm install -g @mariozechner/pi-coding-agent# Add to ~/.pi/agent/models.json:
{
"providers": {
"mlx-lm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx"
}
]
}
}
}Run Pi
# Start Pi in your project directory:
piQwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx
⚠️ THIS IS A TEXT-ONLY MODEL — NO VISION
The upstream abliteration pass stripped the vision tower. For vision-capable Qwen 3.6 27B Opus-Distill MLX quants, see our parallel repos at huggingface.co/osmapi (look for repos without
-abliteratedin the name).
OptiQ uniform 4-bit MLX quantization of an abliterated Qwen 3.6 27B Claude-Opus reasoning distill, by the osmAPI team — "OpenRouter of India".
Uniform 4-bit assignment via mlx-optiq — same group size (64) as mlx-lm's default 4-bit but produced through OptiQ's calibration-aware path for better recovery on the boundary layers.
⚡ TL;DR
| Disk size | ~14 GB |
| Effective BPW | 4.0 |
| Scheme | OptiQ uniform 4-bit (group size 64, affine) |
| Recommended RAM | 16 GB Apple Silicon comfortably; 24 GB with long context |
| Vision | ❌ text-only (the upstream abliteration step stripped the ViT) |
| Made by | osmAPI — OpenRouter of India |
🧬 Lineage
Qwen/Qwen3.6-27B (Qwen Team — base pretrain)
│
▼
TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2 (TeichAI — Claude-Opus reasoning distill)
│
▼
abliterated (refusal-ablated) via OBLITERATUS v0.1.2 (multi-direction SVD, BF16)
│
▼
this repo — OptiQ uniform 4-bit, MLX format (osmAPI team — quantization)
Direct upstream links:
- 🏛️ Foundation: Qwen/Qwen3.6-27B
- 🎓 Reasoning distill: TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2
- 🔓 Abliteration tool: OBLITERATUS (multi-direction SVD, 6 directions, 3 refinement passes, λ=0.08)
- 🧮 Quantization tool: mlx-lm + mlx-optiq for OptiQ variants
📦 Use it
mlx-lm (recommended)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx")
prompt = "Explain the difference between SSM and softmax attention in three sentences."
out = generate(model, tokenizer, prompt=prompt, max_tokens=400)
print(out)
Chat template
messages = [
{"role": "system", "content": "You are a helpful, candid reasoning assistant."},
{"role": "user", "content": "Plan a 3-day Tokyo itinerary for a foodie."},
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
print(generate(model, tokenizer, prompt=prompt, max_tokens=600))
CLI
mlx_lm.generate --model osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx --prompt "Hello" --max-tokens 256
🧪 Quantization details
- Source weights: BF16 abliterated checkpoint (28 shards, ~57 GB) derived from TeichAI/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2 via OBLITERATUS multi-direction SVD ablation (preserves coherence; KL drift = 0.149 from base).
- Quantization scheme: OptiQ uniform 4-bit (group size 64, affine).
- Group size: 64.
- Calibration corpus:
mlx-lmcalibration_v5 (~427 KB English text, used for OptiQ sensitivity ranking; uniform/affine variants do not require calibration). - Sanity check: forward perplexity on held-out calibration text within 1–3% of next-higher-precision sibling.
Architecture notes
The Qwen 3.6 27B family uses a hybrid attention stack — 4 GatedDeltaNet (linear-attention/SSM) layers followed by 1 full-softmax-attention layer, repeated 16× for 64 total layers, 5120 hidden, 248K vocab, 262K context. The SSM kernels lack a VJP path in MLX, so backward-pass-based quant methods (DWQ, dynamic quant) cannot be applied here — OptiQ's forward-only sensitivity approach is the only calibration-aware option that works on this architecture. That's why the OptiQ variants exist.
⚠️ Behavior caveats
- Text-only — no vision. The abliteration pipeline (OBLITERATUS) ran on the LM tower and stripped the ViT. For vision-capable quants of the same Opus-Distill v2 lineage, use our parallel non-abliterated repos at huggingface.co/osmapi (any repo without
-abliteratedin the name). - This is an abliterated model — refusal directions were surgically removed from the parent. It will answer prompts the parent would refuse. Use responsibly and within applicable law.
- Quantization preserves abliteration: the refusal rate measured at BF16 (~35% from a 100% baseline) stays in that range across our quants.
🙏 Credits
| Quantization & release | osmAPI team — "OpenRouter of India" |
| Reasoning distill | TeichAI (Claude-Opus 4.5/4.6 high-reasoning datasets) |
| Foundation model | Qwen Team |
| Abliteration toolkit | OBLITERATUS by elder-plinius |
| Quant toolkit | mlx-lm, mlx-optiq |
📜 License
Apache-2.0, inherited from the foundation and distill upstream.
Need a hosted endpoint, custom quant, or larger-scale inference? osmAPI — multi-provider LLM routing for the Indian developer ecosystem.
- Downloads last month
- 1,618
4-bit
Model tree for osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx
Base model
Qwen/Qwen3.6-27B
Start the MLX server
# Install MLX LM: uv tool install mlx-lm# Start a local OpenAI-compatible server: mlx_lm.server --model "osmapi/Qwen3.6-27B-Claude-Opus-Reasoning-Distill-v2-abliterated-OptiQ-4bit-mlx"