Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.17.3
metadata
title: Robot Policy Evaluation Harness
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: true
license: mit
short_description: Rigorous eval harness for robot policies
Robot Policy Evaluation Harness
Drop in your robot rollout data → get a statistically rigorous evaluation report.
Implements best practices from Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491:
- ① Bayesian Bernoulli — honest uncertainty on success rates, not bare percentages
- ② SPARC smoothness — motion quality from joint-space speed profiles
- ③ STL safety scoring — automatic constraint checking, no human video review
Demo runs on real ALOHA bimanual robot data. Upload your own CSV to evaluate your policies.