Qwen-AgentWorld: Language World Models for General Agents Paper • 2606.24597 • Published 5 days ago • 135
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published May 28 • 36
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 165
$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published May 19 • 108
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published May 28 • 36
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published May 18 • 30
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10, 2025 • 193
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 132
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models Paper • 2503.11224 • Published Mar 14, 2025 • 28
UltraIF: Advancing Instruction Following from the Wild Paper • 2502.04153 • Published Feb 6, 2025 • 24