Qwen3-4B-GRPO-KL-math-reasoning / model-00001-of-00002.safetensors

Commit History

Upload final checkpoint (Qwen3-4B-GRPO-KL-math-reasoning)
ef42b80
verified

jaygala24 commited on