GLM-5.2-0.8B-A0.8B

This is a tiny version of zai-org/GLM-5.2 created for testing and development.

Model Details

  • Base Model: zai-org/GLM-5.2
  • Architecture: glm_moe_dsa (GLM MoE with DeepSeek Sparse Attention)
  • Total Parameters: 0.85B
  • Activated Parameters: ~0.77B

Configuration Changes

The following parameters were reduced from the original model:

Parameter Original Tiny
num_hidden_layers 78 6
hidden_size 6144 2048
intermediate_size 12288 4096
num_attention_heads 64 16
num_key_value_heads 64 16
n_routed_experts 256 8
num_experts_per_tok 8 2
moe_intermediate_size 2048 512
kv_lora_rank 512 128
q_lora_rank 2048 512
v_head_dim 256 128
index_n_heads 32 8
index_head_dim 128 64
first_k_dense_replace 3 2

Checkpoint Structure

Single safetensors file containing 194 tensors in float32. Layers 0-1 have dense MLP, layers 2-5 have MoE MLP. Layers 0-2 have full DSA indexer weights, layers 3-5 use shared indexer.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("inference-optimization/GLM-5.2-0.8B-A0.8B", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("inference-optimization/GLM-5.2-0.8B-A0.8B")

input_ids = tokenizer("According to all known laws", return_tensors="pt").input_ids.to(model.device)
output = model.generate(input_ids, max_new_tokens=20)
print(tokenizer.decode(output[0]))

Creation Process

  1. Inspected original GLM-5.2 config (78 layers, 256 experts, hidden_size=6144)
  2. Reduced all dimensions to target ~1B parameters while preserving architecture
  3. Created model with float32 dtype for training stability
  4. Fine-tuned on copypasta dataset to perplexity ~1.0
  5. Validated checkpoint structure matches original model naming conventions
  6. Validated model loads, inferences, and generates correctly

Validation Output

Success: 1.0000379085540771 <= 10.0
Generating sample text:
According to all known laws of aviation, there is no way a bee should be able to fly.

Notes

  • The model uses float32 dtype (original uses bfloat16) to ensure proper initialization and training of the tiny model
  • Architecture preserves both dense and sparse MLP layer types, MLA attention with compressed Q/KV, and DSA indexer with full/shared patterns
  • The model has been fine-tuned on a toy dataset and is intended for testing purposes only
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for inference-optimization/GLM-5.2-0.8B-A0.8B

Base model

zai-org/GLM-5.2
Finetuned
(11)
this model

Collection including inference-optimization/GLM-5.2-0.8B-A0.8B