Current World Models Lack a Persistent State Core
Paper • 2606.20545 • Published • 16
Official WRBench paper, public datasets, benchmark videos, human annotations, and interactive leaderboard.
Note WRBench paper page
Note Natural-25 prompts, variants, and first frames
Note 26-model aggregate diagnostic scores
Note Human pairwise annotation verdicts
Note 26-model / 11,100 benchmark videos with per-video scores
Diagnostic D1-D6 leaderboard for world models
Note Interactive D1-D6 leaderboard