Instructions to use philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX
Run Hermes
hermes
- MLX LM
How to use philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX", "messages": [ {"role": "user", "content": "Hello"} ] }'
GLM-5.2-Demolition · q4a4-soul (v3)
A demolition of zai-org/GLM-5.2 (743B total / 39B active MoE,
MIT) down to a ~98 GB 4-bit model that loads and runs fully on a single Apple M5 Max (128 GB). v3's
distinguishing move is soul-targeted expert pruning — the kept experts are chosen by saliency measured on
our facet data, not a generic corpus — plus a deliberately pure vanilla-code core with swappable
heritage "souls" mounted on demand.
Architecture: a PURE core + swappable souls
- CORE (always-on): just the vanilla languages, done excellently — Python · TypeScript · JavaScript · Rust · Go · HTML · CSS · SQL · Postgres. No frameworks, no baked specialties. Latest versions, vanilla stdlib.
- SOULS (mount per-request — the model factory): small LoRA adapters that name a field's masters to
activate latent eliteness:
- art (Basquiat/Haring/Banksy/Sol LeWitt/Casey Reas) · music (Bach/J Dilla/Eno/WALL-E) · design (Rams/Bauhaus) · perfumery (Beaux/Guerlain/Ellena) · science (Feynman/Darwin/Sagan) · legacy (K&R/Knuth/Dijkstra/Hopper) · security (Saltzer-Schroeder/Aleph-One, purple-team) · gamedev (Carmack/Handmade-Hero, vanilla from-scratch) · fullstack (htmx/Go-stdlib/Postgres) · math · dataviz · prose · architecture · research.
The demolition lineage (honest)
| ver | prune | quant | size | result |
|---|---|---|---|---|
| v1 | keep 30% experts (generic calib) | 3-bit | 99 GB | broke — hallucinates, sentence-loops |
| v2 | keep 23% experts (code calib) | 4-bit | 98 GB | design coherent; trivia gone (by design) |
| v3 | keep 23% experts (soul calib) | 4-bit | ~98 GB | coherent FOCUS-9 vanilla code (healed) |
Why 4-bit, not 3: 3-bit was just below the quality cliff; 4-bit is just above it and MLX's best-optimized kernel (cleanest packing). 2-bit is worse. No bit-width of a demolished 744B beats a clean right-sized model — this artifact is the best-possible demolition, a research result, not a frontier claim.
Method
- Saliency (
23_stream_calibrate) on our facet corpus → score each routed expert. - Prune (
24_apply_prune --ratio 0.77) → keep the top-saliency experts. - Re-quantize (
24b_stream_requantize --bits 4) → uniform 4-bit experts, 4-bit attn, 6-bit head. - Heal (
06_heal_lora) — LoRA on vanilla FOCUS-9 gold; souls heal separately per facet.
Honest scope
- Speed: ~10 tok/s — memory-bandwidth-bound (inherent to a 98 GB model on M5; spec-decode nets only ~1.05× here, so it's not used).
- Strengths: the FOCUS-9 vanilla languages + whichever soul is mounted. Not general trivia — those experts were deliberately pruned. Best driven by a verifier-first agent (the compiler steers each line).
- Eval: HumanEval-164 pass@1 = 114/164 (69%) — full set, single-shot, scored on hidden tests by real
verifiers (the easy n=20 subset was 95%). Mid-tier-usable for a demolished 4-bit model on a laptop
(≈ GPT-4-class on this metric; below dedicated frontier coders ~90%); a verifier-first agent loop lifts it
further. Strong on writing vanilla FOCUS-9 functions from a spec; weaker on hard debugging/multi-step and
off-distribution prompts. Honest scope: this is the best-possible demolition of a 744B model, a research
artifact — not a frontier daily-driver. See the full measured write-up in the repo's
MISSION_SUMMARY.md.
Built with the open pipeline at glm52-demolition. Public (MIT — GLM-5.2 is Z.ai Pure-Open).
- Downloads last month
- 2,819
4-bit
Model tree for philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX
Base model
zai-org/GLM-5.2Dataset used to train philipjohnbasile/GLM-5.2-Demolition-q4a4-soul-MLX
Evaluation results
- pass@1 (HumanEval-164, single-shot, verifier-scored) on HumanEvalself-reported69.000