Huihui-Qwen3.6-35B-A3B-abliterated (6-bit MLX)

This is a quantized version of huihui-ai/Huihui-Qwen3.6-35B-A3B-abliterated for Apple Silicon using MLX.

Quantization Details

  • Format: MLX (Apple Silicon)
  • Architecture: Qwen3.6 MoE (Mixture of Experts) — text-only
  • Bits per weight: 6.642
  • Total size: ~27 GB
  • Group size: 64
  • Quantization mode: affine

Mixed Quantization Strategy

Layer type Bits
Embeddings (embed_tokens, lm_head) 8-bit
Attention (self_attn, linear_attn) 8-bit
MoE Router (mlp.gate) 8-bit
Shared Expert + Gate 8-bit
Routed Experts (switch_mlp) 6-bit
All other layers 8-bit

Usage

from mlx_lm import load, generate

model, tokenizer = load("leonsarmiento/Huihui-Qwen3.6-35B-A3B-abliterated-6bit-mlx")
response = generate(model, tokenizer, prompt="Hello!", verbose=True)

Model Source

Downloads last month
477
Safetensors
Model size
35B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leonsarmiento/Huihui-Qwen3.6-35B-A3B-abliterated-6bit-mlx

Quantized
(17)
this model