# Modal Inference Results: Fire Boy MiniCPM-V Router

Date: 2026-06-15

## Live Endpoint

```text
Modal app: fireboy-vla-router
URL: https://sanjuhs123--fireboy-vla-router.modal.run
GPU: L40S
idle scaledown window: 60 seconds
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
model: openbmb/MiniCPM-V-4.6
policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
```

This endpoint serves the promoted frozen MiniCPM-V skill/parameter router with a
custom PyTorch action head. It is not served through vLLM because the router
needs MiniCPM hidden states plus a custom continuous head, not token generation.

## Local Website Wiring

```text
TOYBOX_VLA_ROUTER_URL=https://sanjuhs123--fireboy-vla-router.modal.run
TOYBOX_VLA_ROUTER_ACTION=1
local app: http://127.0.0.1:65373
policy gallery: http://127.0.0.1:65373/fireboy-policy-gallery
```

The Toy Room path is:

```text
browser command -> /api/pet-action
  -> Modal /route
  -> MiniCPM-V frozen encoder + skill/parameter head
  -> MuJoCo policy registry dispatch
  -> Toy Room animation/result JSON
```

## Verification Matrix

All commands below were tested through the local website API on 2026-06-15.
The VLA router ran on Modal with `device: cuda`.

```text
walk to the yellow marker
  served skill: walk_to
  dispatch: registry:walk_to
  /api/pet-action: success true
  animation: walk

run around
  served skill: run_around
  dispatch: registry:run_around
  /api/pet-action: success true
  animation: run

pick up the berry
  served skill: pick_up
  dispatch: registry:pick_up
  /api/pet-action: success true
  animation: hold

go find berry and eat it
  served skill: find_and_eat_berry
  dispatch: registry:find_and_eat_berry
  /api/pet-action: success true
  animation: hold
```

## Important Runtime Guard

With a blank/generated camera frame, the raw neural skill head can become
overconfident toward `find_and_eat_berry`. The live endpoint therefore exposes:

```text
neural_skill: raw MiniCPM-V head prediction
skill: command/scene-stabilized served skill
raw_params: raw continuous head output
params: scene-grounded served parameters
```

This keeps the demo reliable while preserving transparency. If the browser sends
a real camera frame and full robot state, the same endpoint can be tested with
`force_neural_skill: true` to inspect the pure neural decision.

## Proof Screenshot

```text
Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
```

## Repeatable Final Smoke Gate

Run this before submission:

```bash
PYTHONPATH=fireboy-vla-physics/src \
fireboy-vla-physics/.venv/bin/python \
fireboy-vla-physics/src/final_vla_demo_smoke.py \
  --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
```

Latest result:

```text
ok: true
route checks: walk_to, run_around, pick_up, find_and_eat_berry all passed on cuda
pet-action checks: all four commands dispatched through Modal VLA + MuJoCo successfully
registry validation: checked_paths 49, missing_count 0
RunPod pods in proof: []
```

Proof JSON:

```text
Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
```