Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published May 9 • 80
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation Paper • 2605.10813 • Published 29 days ago • 16
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models Paper • 2606.01961 • Published 8 days ago • 25