TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning Paper • 2606.32017 • Published 2 days ago • 6
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published May 12 • 10