Spaces:

ShubhamRasal
/

robot-policy-eval

Running

App Files Files Community

robot-policy-eval / README.md

Shubham-Rasal

fix: upgrade to gradio>=5 for python 3.13 compatibility (audioop removed)

1dd20f4 1 day ago

preview code

raw

history blame contribute delete

889 Bytes

A newer version of the Gradio SDK is available: 6.17.3

Upgrade

metadata

title: Robot Policy Evaluation Harness
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: true
license: mit
short_description: Rigorous eval harness for robot policies

Robot Policy Evaluation Harness

Drop in your robot rollout data → get a statistically rigorous evaluation report.

Implements best practices from Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491:

① Bayesian Bernoulli — honest uncertainty on success rates, not bare percentages
② SPARC smoothness — motion quality from joint-space speed profiles
③ STL safety scoring — automatic constraint checking, no human video review

Demo runs on real ALOHA bimanual robot data. Upload your own CSV to evaluate your policies.