How to use from the
Use from the
MLX library
# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("TheCluster/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking-MLX-mxfp8")
config = load_config("TheCluster/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking-MLX-mxfp8")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking

Quality: quantized (mxfp8, group size: 32, 8.341 bpw)

40 billion parameters (dense, not moe) expanded from Qwen3.5 27B, then trained on Claude 4.6 Opus High Reasoning dataset via Unsloth on local hardware.

96 layers, 1275 Tensors. (50% more than base model of 27B)

Features variable length reasoning ; less complex = shorter, longer for more complex.

Model performance has increased dramatically.

256K context.

More information


Source

This model was converted to MLX format from DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking using mlx-vlm version 0.4.

Downloads last month
27
Safetensors
Model size
39B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheCluster/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking-MLX-mxfp8