Spaces:
Running
Running
| title: Robot Policy Evaluation Harness | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: "5.29.0" | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: Rigorous eval harness for robot policies | |
| # Robot Policy Evaluation Harness | |
| **Drop in your robot rollout data β get a statistically rigorous evaluation report.** | |
| Implements best practices from [Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491](https://arxiv.org/abs/2409.09491): | |
| - **β Bayesian Bernoulli** β honest uncertainty on success rates, not bare percentages | |
| - **β‘ SPARC smoothness** β motion quality from joint-space speed profiles | |
| - **β’ STL safety scoring** β automatic constraint checking, no human video review | |
| Demo runs on real [ALOHA bimanual robot data](https://huggingface.co/datasets/lerobot/aloha_static_cups_open). Upload your own CSV to evaluate your policies. | |