Qwen-3.5-unsloth-mlx
Collection
AWQ-style pre-scaling using Unsloth's imatrix calibration data, then 3-6-bit affine quantization with the Unsloth mixed-precision recipe via MLX • 20 items • Updated • 20
How to use Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx with MLX:
# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm
# Generate text with mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx")
prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
text = generate(model, tokenizer, prompt=prompt, verbose=True)How to use Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx with Pi:
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx"
# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
"providers": {
"mlx-lm": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx"
}
]
}
}
}# Start Pi in your project directory: pi
How to use Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx with Hermes Agent:
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx"
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx
hermes
How to use Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx with MLX LM:
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx"
# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx",
"messages": [
{"role": "user", "content": "Hello"}
]
}'3-bit base mixed-precision quantization of Qwen/Qwen3.5-27B for Apple Silicon, using the Unsloth Dynamic quantization strategy via mlx-node.
| Original (BF16) | This Model | |
|---|---|---|
| Size | ~51 GB | 17 GB |
| Format | SafeTensors (sharded) | SafeTensors (single file) |
| Precision | BF16 uniform | Mixed 3/4/5/6/8-bit + BF16 |
| Repo | GGUF Equivalent | Size |
|---|---|---|
| Brooooooklyn/Qwen3.5-27B-UD-Q2_K_XL-mlx | UD-Q2_K_XL | 15 GB |
| Brooooooklyn/Qwen3.5-27B-UD-Q3_K_XL-mlx | UD-Q3_K_XL | 17 GB |
| Brooooooklyn/Qwen3.5-27B-UD-Q4_K_XL-mlx | UD-Q4_K_XL | 20 GB |
| Brooooooklyn/Qwen3.5-27B-UD-Q5_K_XL-mlx | UD-Q5_K_XL | 24 GB |
| Brooooooklyn/Qwen3.5-27B-UD-Q6_K_XL-mlx | UD-Q6_K_XL | 27 GB |
| Brooooooklyn/Qwen3.5-27B-UD-Q8_K_XL-mlx | UD-Q8_K_XL | 29 GB |
| Weight | Bits | Rationale |
|---|---|---|
embed_tokens |
5-bit | KLD ~0.15 — very low sensitivity |
lm_head |
6-bit | KLD ~0.05 — safest tensor |
self_attn.q/k/v_proj |
5-bit + AWQ | KLD ~1.5-2.9, AWQ via layernorm |
linear_attn.in_proj_qkv/z |
5-bit + AWQ | KLD ~2.9, AWQ via layernorm |
self_attn.o_proj |
bf16 | NOT AWQ-correctable |
linear_attn.out_proj |
bf16 | KLD ~6.0 — worst tensor |
down_proj |
4-bit | "Slightly more sensitive" |
gate_proj, up_proj |
3-bit | "Generally ok" at low bits |
Based on Unsloth Dynamic 2.0 per-tensor KLD analysis with imatrix AWQ pre-scaling.
import {{ loadModel }} from '@mlx-node/lm';
const model = await loadModel('./Qwen3.5-27B-UD-Q3_K_XL-mlx');
const result = await model.chat(
[{{ role: 'user', content: 'Hello!' }}],
{{ maxNewTokens: 2048, temperature: 0.6, enableThinking: false }},
);
console.log(result.text);
mlx convert -i Qwen3.5-27B -o Qwen3.5-27B-UD-Q3_K_XL-mlx -q --q-bits 3 --q-recipe unsloth --imatrix-path imatrix_unsloth.gguf
Apache 2.0 (inherited from base model).
3-bit
Base model
Qwen/Qwen3.5-27B