Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 28 days ago • 247
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation Paper • 2605.30350 • Published 28 days ago • 13
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published May 22 • 81
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published about 1 month ago • 103
Lost in the Folds: When Cross-Validation Is Not a Deep Ensemble for Uncertainty Estimation Paper • 2605.18329 • Published May 18 • 2
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published May 12 • 196
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence Paper • 2605.12882 • Published May 13 • 274
sentence-transformers/all-mpnet-base-v2 Sentence Similarity • 0.1B • Updated Aug 19, 2025 • 34.5M • • 1.31k
A Benchmark for Interactive World Models with a Unified Action Generation Framework Paper • 2605.03941 • Published May 5 • 5