Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments Paper • 2601.07606 • Published Jan 12 • 1
view article Article Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments shanchen • Jan 13 • 10
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows Paper • 2604.28139 • Published Apr 30 • 42
RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics Paper • 2510.24736 • Published Oct 14, 2025 • 1
Dispersion Loss Counteracts Embedding Condensation and Improves Generalization in Small Language Models Paper • 2602.00217 • Published Jan 30 • 1
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published Dec 15, 2025 • 65