robot-policy-eval / README.md
Shubham-Rasal
fix: upgrade to gradio>=5 for python 3.13 compatibility (audioop removed)
1dd20f4

A newer version of the Gradio SDK is available: 6.17.3

Upgrade
metadata
title: Robot Policy Evaluation Harness
emoji: 🤖
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: true
license: mit
short_description: Rigorous eval harness for robot policies

Robot Policy Evaluation Harness

Drop in your robot rollout data → get a statistically rigorous evaluation report.

Implements best practices from Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491:

  • ① Bayesian Bernoulli — honest uncertainty on success rates, not bare percentages
  • ② SPARC smoothness — motion quality from joint-space speed profiles
  • ③ STL safety scoring — automatic constraint checking, no human video review

Demo runs on real ALOHA bimanual robot data. Upload your own CSV to evaluate your policies.