Spaces:

ShubhamRasal
/

robot-policy-eval

Running

Shubham-Rasal

fix: upgrade to gradio>=5 for python 3.13 compatibility (audioop removed)

1dd20f4 7 days ago

889 Bytes

	---
	title: Robot Policy Evaluation Harness
	emoji: 🤖
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: "5.29.0"
	app_file: app.py
	pinned: true
	license: mit
	short_description: Rigorous eval harness for robot policies
	---

	# Robot Policy Evaluation Harness

	Drop in your robot rollout data → get a statistically rigorous evaluation report.

	Implements best practices from [Kress-Gazit et al. (TRI/Cornell), arXiv:2409.09491](https://arxiv.org/abs/2409.09491):

	- ① Bayesian Bernoulli — honest uncertainty on success rates, not bare percentages
	- ② SPARC smoothness — motion quality from joint-space speed profiles
	- ③ STL safety scoring — automatic constraint checking, no human video review

	Demo runs on real [ALOHA bimanual robot data](https://huggingface.co/datasets/lerobot/aloha_static_cups_open). Upload your own CSV to evaluate your policies.