Instructions to use pipenetwork/GLM-5.2-MLX-mixed-3_6bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use pipenetwork/GLM-5.2-MLX-mixed-3_6bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("pipenetwork/GLM-5.2-MLX-mixed-3_6bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use pipenetwork/GLM-5.2-MLX-mixed-3_6bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "pipenetwork/GLM-5.2-MLX-mixed-3_6bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "pipenetwork/GLM-5.2-MLX-mixed-3_6bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use pipenetwork/GLM-5.2-MLX-mixed-3_6bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "pipenetwork/GLM-5.2-MLX-mixed-3_6bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default pipenetwork/GLM-5.2-MLX-mixed-3_6bit
Run Hermes
hermes
- MLX LM
How to use pipenetwork/GLM-5.2-MLX-mixed-3_6bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "pipenetwork/GLM-5.2-MLX-mixed-3_6bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "pipenetwork/GLM-5.2-MLX-mixed-3_6bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "pipenetwork/GLM-5.2-MLX-mixed-3_6bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Model seems to have issues despite the smoke test.
I downloaded the current repo and tested with mlx-lm 0.31.3.
The glm_moe_dsa module exists in my install:
mlx-lm: 0.31.3
mlx_lm.models.glm_moe_dsa present
But loading fails with:
ValueError: Missing 285 parameters, all under self_attn.indexer.*
I force-downloaded the current model.safetensors.index.json from the repo and checked it directly. It has 3481 tensors and does not contain entries like:
model.layers.11.self_attn.indexer.k_norm.bias
model.layers.11.self_attn.indexer.k_norm.weight
model.layers.11.self_attn.indexer.weights_proj.weight
model.layers.11.self_attn.indexer.wk.weight
model.layers.11.self_attn.indexer.wq_b.weight
Can you confirm which mlx-lm commit/version was used for the smoke test, and whether the uploaded MLX weights intentionally omit the DSA indexer tensors?
There appears to be an unmerged PR that fixes this.
Usage
NOTE: Run with https://github.com/ml-explore/mlx-lm/pull/1410 until the PR is merged.
# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server \
--host 127.0.0.1 \
--port 8080 \
--model spicyneuron/GLM-5.2-MLX-4.5bit
This is from https://huggingface.co/spicyneuron/GLM-5.2-MLX-4.5bit/blob/main/README.md
I have downloaded the 332GB pipenetwork 85-shard model and it runs under this PR on an M3 512 at about 17 t/s for a single prompt.
The GLM-5.2 model itself exceeds all expectations.
My standard tests are:
(a) to write a python script that calculates Carmichael Numbers up to a limit supplied by the user; it one-shotted it. Most open source models [used to] get the prime logic wrong.
(b) To devise and implement a programme that revises my knowledge of Mandarin based on the HSK structure. Almost no models can do this adequately without a lot of interventions, but GLM-5.2 absolutely nailed it.
Very impressed at around 17 token/s on M3 Ultra 512GB running under MLX with the PR patch mentioned.