robot-policy-eval / README.md
Shubham-Rasal
fix: upgrade to gradio>=5 for python 3.13 compatibility (audioop removed)
1dd20f4
---
title: Robot Policy Evaluation Harness
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: "5.29.0"
app_file: app.py
pinned: true
license: mit
short_description: Rigorous eval harness for robot policies
---
# Robot Policy Evaluation Harness
**Drop in your robot rollout data β†’ get a statistically rigorous evaluation report.**
Implements best practices from [Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491](https://arxiv.org/abs/2409.09491):
- **β‘  Bayesian Bernoulli** β€” honest uncertainty on success rates, not bare percentages
- **β‘‘ SPARC smoothness** β€” motion quality from joint-space speed profiles
- **β‘’ STL safety scoring** β€” automatic constraint checking, no human video review
Demo runs on real [ALOHA bimanual robot data](https://huggingface.co/datasets/lerobot/aloha_static_cups_open). Upload your own CSV to evaluate your policies.