| # Modal Inference Results: Fire Boy MiniCPM-V Router |
|
|
| Date: 2026-06-15 |
|
|
| ## Live Endpoint |
|
|
| ```text |
| Modal app: fireboy-vla-router |
| URL: https://sanjuhs123--fireboy-vla-router.modal.run |
| GPU: L40S |
| idle scaledown window: 60 seconds |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt |
| model: openbmb/MiniCPM-V-4.6 |
| policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1 |
| ``` |
|
|
| This endpoint serves the promoted frozen MiniCPM-V skill/parameter router with a |
| custom PyTorch action head. It is not served through vLLM because the router |
| needs MiniCPM hidden states plus a custom continuous head, not token generation. |
|
|
| ## Local Website Wiring |
|
|
| ```text |
| TOYBOX_VLA_ROUTER_URL=https://sanjuhs123--fireboy-vla-router.modal.run |
| TOYBOX_VLA_ROUTER_ACTION=1 |
| local app: http://127.0.0.1:65373 |
| policy gallery: http://127.0.0.1:65373/fireboy-policy-gallery |
| ``` |
|
|
| The Toy Room path is: |
|
|
| ```text |
| browser command -> /api/pet-action |
| -> Modal /route |
| -> MiniCPM-V frozen encoder + skill/parameter head |
| -> MuJoCo policy registry dispatch |
| -> Toy Room animation/result JSON |
| ``` |
|
|
| ## Verification Matrix |
|
|
| All commands below were tested through the local website API on 2026-06-15. |
| The VLA router ran on Modal with `device: cuda`. |
|
|
| ```text |
| walk to the yellow marker |
| served skill: walk_to |
| dispatch: registry:walk_to |
| /api/pet-action: success true |
| animation: walk |
| |
| run around |
| served skill: run_around |
| dispatch: registry:run_around |
| /api/pet-action: success true |
| animation: run |
| |
| pick up the berry |
| served skill: pick_up |
| dispatch: registry:pick_up |
| /api/pet-action: success true |
| animation: hold |
| |
| go find berry and eat it |
| served skill: find_and_eat_berry |
| dispatch: registry:find_and_eat_berry |
| /api/pet-action: success true |
| animation: hold |
| ``` |
|
|
| ## Important Runtime Guard |
|
|
| With a blank/generated camera frame, the raw neural skill head can become |
| overconfident toward `find_and_eat_berry`. The live endpoint therefore exposes: |
|
|
| ```text |
| neural_skill: raw MiniCPM-V head prediction |
| skill: command/scene-stabilized served skill |
| raw_params: raw continuous head output |
| params: scene-grounded served parameters |
| ``` |
|
|
| This keeps the demo reliable while preserving transparency. If the browser sends |
| a real camera frame and full robot state, the same endpoint can be tested with |
| `force_neural_skill: true` to inspect the pure neural decision. |
|
|
| ## Proof Screenshot |
|
|
| ```text |
| Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png |
| ``` |
|
|
| ## Repeatable Final Smoke Gate |
|
|
| Run this before submission: |
|
|
| ```bash |
| PYTHONPATH=fireboy-vla-physics/src \ |
| fireboy-vla-physics/.venv/bin/python \ |
| fireboy-vla-physics/src/final_vla_demo_smoke.py \ |
| --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json |
| ``` |
|
|
| Latest result: |
|
|
| ```text |
| ok: true |
| route checks: walk_to, run_around, pick_up, find_and_eat_berry all passed on cuda |
| pet-action checks: all four commands dispatched through Modal VLA + MuJoCo successfully |
| registry validation: checked_paths 49, missing_count 0 |
| RunPod pods in proof: [] |
| ``` |
|
|
| Proof JSON: |
|
|
| ```text |
| Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json |
| ``` |
|
|