--- title: Robot Policy Evaluation Harness emoji: 🤖 colorFrom: blue colorTo: indigo sdk: gradio sdk_version: "5.29.0" app_file: app.py pinned: true license: mit short_description: Rigorous eval harness for robot policies --- # Robot Policy Evaluation Harness **Drop in your robot rollout data → get a statistically rigorous evaluation report.** Implements best practices from [Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491](https://arxiv.org/abs/2409.09491): - **① Bayesian Bernoulli** — honest uncertainty on success rates, not bare percentages - **② SPARC smoothness** — motion quality from joint-space speed profiles - **③ STL safety scoring** — automatic constraint checking, no human video review Demo runs on real [ALOHA bimanual robot data](https://huggingface.co/datasets/lerobot/aloha_static_cups_open). Upload your own CSV to evaluate your policies.