Model seems to have issues despite the smoke test.

#1
by webboty - opened

I downloaded the current repo and tested with mlx-lm 0.31.3.

The glm_moe_dsa module exists in my install:
mlx-lm: 0.31.3
mlx_lm.models.glm_moe_dsa present

But loading fails with:
ValueError: Missing 285 parameters, all under self_attn.indexer.*

I force-downloaded the current model.safetensors.index.json from the repo and checked it directly. It has 3481 tensors and does not contain entries like:

model.layers.11.self_attn.indexer.k_norm.bias
model.layers.11.self_attn.indexer.k_norm.weight
model.layers.11.self_attn.indexer.weights_proj.weight
model.layers.11.self_attn.indexer.wk.weight
model.layers.11.self_attn.indexer.wq_b.weight

Can you confirm which mlx-lm commit/version was used for the smoke test, and whether the uploaded MLX weights intentionally omit the DSA indexer tensors?

There appears to be an unmerged PR that fixes this.

Usage
NOTE: Run with https://github.com/ml-explore/mlx-lm/pull/1410 until the PR is merged.

# Start server at http://localhost:8080/v1/chat/completions
uvx --from mlx-lm mlx_lm.server \
  --host 127.0.0.1 \
  --port 8080 \
  --model spicyneuron/GLM-5.2-MLX-4.5bit

This is from https://huggingface.co/spicyneuron/GLM-5.2-MLX-4.5bit/blob/main/README.md
I have downloaded the 332GB pipenetwork 85-shard model and it runs under this PR on an M3 512 at about 17 t/s for a single prompt.

The GLM-5.2 model itself exceeds all expectations.
My standard tests are:
(a) to write a python script that calculates Carmichael Numbers up to a limit supplied by the user; it one-shotted it. Most open source models [used to] get the prime logic wrong.
(b) To devise and implement a programme that revises my knowledge of Mandarin based on the HSK structure. Almost no models can do this adequately without a lot of interventions, but GLM-5.2 absolutely nailed it.
Very impressed at around 17 token/s on M3 Ultra 512GB running under MLX with the PR patch mentioned.

Sign up or log in to comment