Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Paper • 2606.19704 • Published 2 days ago • 20
QueST: Persistent Queries as Semantic Monitors for Drift Suppression in Long-Horizon Tracking Paper • 2605.09513 • Published May 10 • 2