docs/modal-inference-results-2026-06-15.md · sanjuhs/fireboy-minicpm-v-4-6-vla at main

fireboy-minicpm-v-4-6-vla / docs /modal-inference-results-2026-06-15.md

sanjuhs

Duplicate from build-small-hackathon/fireboy-minicpm-v-4-6-vla

5bd41a1 15 days ago

preview code

Raw

History Blame Contribute Delete

3.17 kB

	# Modal Inference Results: Fire Boy MiniCPM-V Router

	Date: 2026-06-15

	## Live Endpoint

	```text
	Modal app: fireboy-vla-router
	URL: https://sanjuhs123--fireboy-vla-router.modal.run
	GPU: L40S
	idle scaledown window: 60 seconds
	checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
	model: openbmb/MiniCPM-V-4.6
	policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
	```

	This endpoint serves the promoted frozen MiniCPM-V skill/parameter router with a
	custom PyTorch action head. It is not served through vLLM because the router
	needs MiniCPM hidden states plus a custom continuous head, not token generation.

	## Local Website Wiring

	```text
	TOYBOX_VLA_ROUTER_URL=https://sanjuhs123--fireboy-vla-router.modal.run
	TOYBOX_VLA_ROUTER_ACTION=1
	local app: http://127.0.0.1:65373
	policy gallery: http://127.0.0.1:65373/fireboy-policy-gallery
	```

	The Toy Room path is:

	```text
	browser command -> /api/pet-action
	-> Modal /route
	-> MiniCPM-V frozen encoder + skill/parameter head
	-> MuJoCo policy registry dispatch
	-> Toy Room animation/result JSON
	```

	## Verification Matrix

	All commands below were tested through the local website API on 2026-06-15.
	The VLA router ran on Modal with `device: cuda`.

	```text
	walk to the yellow marker
	served skill: walk_to
	dispatch: registry:walk_to
	/api/pet-action: success true
	animation: walk

	run around
	served skill: run_around
	dispatch: registry:run_around
	/api/pet-action: success true
	animation: run

	pick up the berry
	served skill: pick_up
	dispatch: registry:pick_up
	/api/pet-action: success true
	animation: hold

	go find berry and eat it
	served skill: find_and_eat_berry
	dispatch: registry:find_and_eat_berry
	/api/pet-action: success true
	animation: hold
	```

	## Important Runtime Guard

	With a blank/generated camera frame, the raw neural skill head can become
	overconfident toward `find_and_eat_berry`. The live endpoint therefore exposes:

	```text
	neural_skill: raw MiniCPM-V head prediction
	skill: command/scene-stabilized served skill
	raw_params: raw continuous head output
	params: scene-grounded served parameters
	```

	This keeps the demo reliable while preserving transparency. If the browser sends
	a real camera frame and full robot state, the same endpoint can be tested with
	`force_neural_skill: true` to inspect the pure neural decision.

	## Proof Screenshot

	```text
	Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
	```

	## Repeatable Final Smoke Gate

	Run this before submission:

	```bash
	PYTHONPATH=fireboy-vla-physics/src \
	fireboy-vla-physics/.venv/bin/python \
	fireboy-vla-physics/src/final_vla_demo_smoke.py \
	--out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
	```

	Latest result:

	```text
	ok: true
	route checks: walk_to, run_around, pick_up, find_and_eat_berry all passed on cuda
	pet-action checks: all four commands dispatched through Modal VLA + MuJoCo successfully
	registry validation: checked_paths 49, missing_count 0
	RunPod pods in proof: []
	```

	Proof JSON:

	```text
	Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
	```