Qwen2.5-3B-GRPO-KL-math-reasoning / model-00002-of-00002.safetensors

Commit History

Upload final checkpoint (Qwen2.5-3B-GRPO-KL-math-reasoning)
fd710e9
verified

jaygala24 commited on