# Fireboy Training Policy VLA

This folder is the dedicated planning space for turning Fire Boy into a
MiniCPM-V-driven vision-language-action pet.

## Current Verified Stack

The current source of truth for policy routing is:

```text
fireboy-vla-physics/policy_registry.json
```

## Next VLA Lane: Skill + Parameters

Direct MiniCPM-V low-level navigation has failed so far, so the next robust VLA
training lane predicts:

```text
image + language + robot state -> skill_id + skill parameters
```

The first generated skill-param manifest is:

```text
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.jsonl
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.summary.json
rows: 3072
skipped images: 0
skills:
  walk_to: 480
  run_around: 512
  pick_up: 1028
  find_and_eat_berry: 1052
```

The RunPod training launcher is:

```bash
bash fireboy-vla-physics/scripts/train_minicpm_vla_skill_param_head_runpod.sh
```

Latest RunPod output:

```text
fireboy-vla-physics/build/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-skill-param-artifacts.tgz
GPU: NVIDIA A40
device: cuda
model: openbmb/MiniCPM-V-4.6
policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
eval rows: 512
eval skill_accuracy: 1.0
eval param_mae: 0.017043352127075195
target_x MAE: 0.032305024564266205
target_y MAE: 0.0478343665599823
target_z MAE: 0.006528750993311405
radius MAE: 0.0038746832869946957
speed_hint MAE: 0.004880381282418966
object_is_berry MAE: 0.006836902815848589
```

This lane is accepted as the current command router. It dispatches into the
existing registry policies for MP4-proven movement/manipulation.

## Modal Live Inference

The promoted frozen router is now deployed as a Modal GPU endpoint:

```text
Modal app: fireboy-vla-router
URL: https://sanjuhs123--fireboy-vla-router.modal.run
GPU: L40S
idle scaledown window: 60 seconds
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
```

Local Toy Room wiring:

```bash
TOYBOX_VLA_ROUTER_URL='https://sanjuhs123--fireboy-vla-router.modal.run' \
TOYBOX_VLA_ROUTER_ACTION=1 \
PORT=65373 PID_FILE=.toybox-65373.pid LOG_FILE=.toybox-65373.log ./start.sh
```

Verified through `http://127.0.0.1:65373/api/pet-action`:

```text
walk to the yellow marker -> vla skill walk_to -> MuJoCo success true
run around -> vla skill run_around -> MuJoCo success true
pick up the berry -> vla skill pick_up -> MuJoCo success true
go find berry and eat it -> vla skill find_and_eat_berry -> MuJoCo success true
```

Proof note:

```text
Fireboy-training-policy-vla/modal-inference-results-2026-06-15.md
Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
```

The live endpoint reports both `neural_skill` and served `skill`. For
blank-camera requests, explicit command/scene arbitration stabilizes the served
skill and target params while keeping raw MiniCPM-V head output visible.

Repeat the final website/VLA smoke gate with:

```bash
PYTHONPATH=fireboy-vla-physics/src \
fireboy-vla-physics/.venv/bin/python \
fireboy-vla-physics/src/final_vla_demo_smoke.py \
  --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
```

Latest smoke result: `ok: true`.

## LoRA Router Lane

The first MiniCPM-V LoRA version of the skill-param router was also trained on
RunPod:

```text
GPU: NVIDIA A40
pod: xb6dv76ajw7tzq
status after artifact download: deleted
script: fireboy-vla-physics/scripts/train_minicpm_vla_lora_skill_param_head_runpod.sh
trainer: fireboy-vla-physics/src/train_minicpm_vla_lora_skill_param_head.py
seed: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
rows: 512
LoRA rank: 8
eval rows: 256
eval skill_accuracy: 1.0
eval param_mae: 0.06290113925933838
```

Artifacts:

```text
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/eval_minicpm_vla_lora_skill_param_head.json
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-lora-skill-param-artifacts.tgz
```

Decision:

```text
LoRA router training works and skill routing remains perfect.
Do not promote this checkpoint over the frozen router yet because its target
parameter MAE is worse: 0.0629 vs 0.0170.
```

Validate it with:

```bash
PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/validate_policy_registry.py
```

Latest validation:

```text
checked_paths: 31
checked_paths after router/LoRA-router registration: 49
missing_count: 0
ok: true
```

Visual proof page:

```text
http://127.0.0.1:65373/fireboy-policy-gallery
```

Saved screenshots:

```text
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-desktop.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-mobile-viewport.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-vla-router.png
```

Build a portable proof bundle with:

```bash
PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/build_policy_proof_bundle.py
```

Latest bundle:

```text
fireboy-vla-physics/build/fireboy-policy-proof-bundle/
fireboy-vla-physics/build/fireboy-policy-proof-bundle.tgz
copied proof/training files: 25
copied proof/training files after router/LoRA-router registration: 33
copied proof/training files after final smoke proof registration: 35
checkpoint/archive references after router/LoRA-router registration: 21
```

Verified command paths:

```text
walk_to / run_to:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_go_to_point_clock/faithful_articulated_policy.npz
  eval: 20/20

walk_around / run_around:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_run_around/faithful_articulated_policy.npz
  eval: 20/20

pick_up:
  lane: minicpm_vla_lora_manipulation
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_residual_512/minicpm_vla_lora_action_head.pt
  eval: 3/3

find_and_eat_berry:
  lane: minicpm_vla_lora_manipulation for GPU VLA proof
  local demo fallback: fireboy-vla-physics/checkpoints/berry_eat_wide/state_policy.npz
  eval: 3/3 MiniCPM LoRA proof, local fallback command test passes

MiniCPM-V skill-param router:
  lane: minicpm_vla_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
  eval: 512/512 skill choices correct, param MAE 0.0170
  dispatches to: walk_to, run_around, pick_up, find_and_eat_berry

MiniCPM-V LoRA skill-param router:
  lane: minicpm_vla_lora_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
  adapter: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
  eval: 256/256 skill choices correct, param MAE 0.0629
  status: preserved, not promoted over frozen router
```

Toy V3 bridge verification:

```text
"walk to the yellow marker with mujoco policy" -> success true, animation walk
"run around with mujoco policy" -> success true, animation run
"pick up the berry with mujoco policy" -> success true, grasped true
"go find berry and eat it with mujoco policy" -> success true, local fallback eaten true
```

Generated local bridge MP4s:

```text
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_go_to_point.mp4
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_run_around.mp4
```

## Source Visual Rig

The current Toy Room v3 Fire Boy rig we should preserve and match is:

```text
fire-boy-rig/fire-boy-rigged-full.glb
```

This is the visual identity of Fire Boy. The physics body should be rebuilt to
match this rig first.

## Immediate Priority

Do this first:

```text
fix Fire Boy physics body first
```

That means:

```text
real Fire Boy GLB skeleton/proportions
-> matching MuJoCo/Newton articulated body
-> correct joints, link lengths, masses, collisions, contact sites
-> visual proof that physics Fire Boy resembles Toy Room v3 Fire Boy
```

We are intentionally leaving these for later:

```text
use pretrained motion priors
generate successful rollouts
fine-tune MiniCPM-style VLA action model
```

## Core Goal

The desired final model is:

```text
image + language + robot state -> action
```

More specifically:

```text
Toy Room camera image
+ user command
+ Fire Boy body state
-> Fire Boy action chunk
```

See:

```text
minicpm-v-to-vla.md
physics-body-first.md
physics-body-fix-results.md
```