Wenboz/SACD-Qwen2.5-3B-ALFWorld-k1-tau0.75-beta1.0-plain-pipeline Reinforcement Learning • 3B • Updated 5 days ago • 29 • 1
Kumeichi/qwen3-4b-agent-trajectory-lora-SFT-SQL-ALFWorld_rev.0.2 Text Generation • 4B • Updated Feb 16