Benchmarked on Apple Silicon (M3 Ultra) β€” 39 tok/s, 100% tool calling via Rapid-MLX

#19
by Raullen - opened

Benchmark Results

Ran Qwopus 3.5-27B-v3 (4bit MLX) on Mac Studio M3 Ultra (256GB) using Rapid-MLX, an OpenAI-compatible inference server optimized for Apple Silicon.

Speed

Metric Result
Decode speed 39 tok/s
TTFT (cold) 0.44s
TTFT (cached) 0.28s
Peak RAM 14.8 GB

Capabilities

Test Result
Tool calling (10 scenarios) 10/10 (100%)
Tool call recovery (multi-round) 2/2 (100%)
Think-tag leak rate 0/5 (0%)
Vision Supported

Agentic Framework Integration

Tested with real client libraries β€” all connecting to Rapid-MLX serving Qwopus:

Framework Score
PydanticAI 6/6 (plain, stream, structured, tool, multi-turn, multi-tool)
LangChain 6/6 (plain, system, stream, tool, multi-tool, structured)
Anthropic SDK 4/5
smolagents 3/4
Claw Code 3/3 (prompt, code gen, tool calling)

How to run

pip install rapid-mlx
rapid-mlx serve Jackrong/MLX-Qwopus3.5-27B-v3-4bit

Then point any OpenAI-compatible app at http://localhost:8000/v1.

Rapid-MLX auto-detects Qwopus and configures the correct parsers (hermes tool parser + qwen3 reasoning parser) β€” no manual flags needed.

Great model! The Claude Opus reasoning distillation really shows in structured output and tool calling quality.

Sign up or log in to comment