Base model artwork

Qwen-9B-Claude-Fable-5-1M-MLX-4bit

A 4-bit MLX quantization of empero-ai/Qwythos-9B-Claude-Mythos-5-1M, published as Qwen-9B-Claude-Fable-5-1M-MLX-4bit and optimized for Apple Silicon with a minimal memory footprint.

Model Details

  • Base model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M
  • Architecture: Qwen 3.5 (32 layers, 4096 hidden, 16 heads, 4 KV heads)
  • Quantization: 4-bit affine, group size 64 (~4.5 bits per weight)
  • Context length: 1M tokens (native)
  • Model size: ~5.04 GB (safetensors)

Usage (MLX)

sfw pip install mlx-lm

Remove sfw if you don't have sfw from Socket installed. It's free and an improvement over running pip or npm raw.

from mlx_lm import load, generate

model, tokenizer = load("shamsghi/Qwen-9B-Claude-Fable-5-1M-MLX-4bit")
response = generate(model, tokenizer, prompt="Hello!", max_tokens=256)
print(response)

Benchmarks

See base model card for evaluation results. 4-bit quantization may introduce minor quality degradation vs. 8-bit.

License

Apache 2.0 — same as the base model.

Downloads last month
492
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shamsghi/Qwen-9B-Claude-Fable-5-1M-MLX-4bit

Finetuned
Qwen/Qwen3.5-9B
Quantized
(32)
this model