Fireboy Training Policy VLA
This folder is the dedicated planning space for turning Fire Boy into a MiniCPM-V-driven vision-language-action pet.
Current Verified Stack
The current source of truth for policy routing is:
fireboy-vla-physics/policy_registry.json
Next VLA Lane: Skill + Parameters
Direct MiniCPM-V low-level navigation has failed so far, so the next robust VLA training lane predicts:
image + language + robot state -> skill_id + skill parameters
The first generated skill-param manifest is:
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.jsonl
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.summary.json
rows: 3072
skipped images: 0
skills:
walk_to: 480
run_around: 512
pick_up: 1028
find_and_eat_berry: 1052
The RunPod training launcher is:
bash fireboy-vla-physics/scripts/train_minicpm_vla_skill_param_head_runpod.sh
Latest RunPod output:
fireboy-vla-physics/build/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-skill-param-artifacts.tgz
GPU: NVIDIA A40
device: cuda
model: openbmb/MiniCPM-V-4.6
policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
eval rows: 512
eval skill_accuracy: 1.0
eval param_mae: 0.017043352127075195
target_x MAE: 0.032305024564266205
target_y MAE: 0.0478343665599823
target_z MAE: 0.006528750993311405
radius MAE: 0.0038746832869946957
speed_hint MAE: 0.004880381282418966
object_is_berry MAE: 0.006836902815848589
This lane is accepted as the current command router. It dispatches into the existing registry policies for MP4-proven movement/manipulation.
Modal Live Inference
The promoted frozen router is now deployed as a Modal GPU endpoint:
Modal app: fireboy-vla-router
URL: https://sanjuhs123--fireboy-vla-router.modal.run
GPU: L40S
idle scaledown window: 60 seconds
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
Local Toy Room wiring:
TOYBOX_VLA_ROUTER_URL='https://sanjuhs123--fireboy-vla-router.modal.run' \
TOYBOX_VLA_ROUTER_ACTION=1 \
PORT=65373 PID_FILE=.toybox-65373.pid LOG_FILE=.toybox-65373.log ./start.sh
Verified through http://127.0.0.1:65373/api/pet-action:
walk to the yellow marker -> vla skill walk_to -> MuJoCo success true
run around -> vla skill run_around -> MuJoCo success true
pick up the berry -> vla skill pick_up -> MuJoCo success true
go find berry and eat it -> vla skill find_and_eat_berry -> MuJoCo success true
Proof note:
Fireboy-training-policy-vla/modal-inference-results-2026-06-15.md
Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
The live endpoint reports both neural_skill and served skill. For
blank-camera requests, explicit command/scene arbitration stabilizes the served
skill and target params while keeping raw MiniCPM-V head output visible.
Repeat the final website/VLA smoke gate with:
PYTHONPATH=fireboy-vla-physics/src \
fireboy-vla-physics/.venv/bin/python \
fireboy-vla-physics/src/final_vla_demo_smoke.py \
--out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json
Latest smoke result: ok: true.
LoRA Router Lane
The first MiniCPM-V LoRA version of the skill-param router was also trained on RunPod:
GPU: NVIDIA A40
pod: xb6dv76ajw7tzq
status after artifact download: deleted
script: fireboy-vla-physics/scripts/train_minicpm_vla_lora_skill_param_head_runpod.sh
trainer: fireboy-vla-physics/src/train_minicpm_vla_lora_skill_param_head.py
seed: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
rows: 512
LoRA rank: 8
eval rows: 256
eval skill_accuracy: 1.0
eval param_mae: 0.06290113925933838
Artifacts:
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/eval_minicpm_vla_lora_skill_param_head.json
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-lora-skill-param-artifacts.tgz
Decision:
LoRA router training works and skill routing remains perfect.
Do not promote this checkpoint over the frozen router yet because its target
parameter MAE is worse: 0.0629 vs 0.0170.
Validate it with:
PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/validate_policy_registry.py
Latest validation:
checked_paths: 31
checked_paths after router/LoRA-router registration: 49
missing_count: 0
ok: true
Visual proof page:
http://127.0.0.1:65373/fireboy-policy-gallery
Saved screenshots:
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-desktop.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-mobile-viewport.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-vla-router.png
Build a portable proof bundle with:
PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/build_policy_proof_bundle.py
Latest bundle:
fireboy-vla-physics/build/fireboy-policy-proof-bundle/
fireboy-vla-physics/build/fireboy-policy-proof-bundle.tgz
copied proof/training files: 25
copied proof/training files after router/LoRA-router registration: 33
copied proof/training files after final smoke proof registration: 35
checkpoint/archive references after router/LoRA-router registration: 21
Verified command paths:
walk_to / run_to:
lane: mujoco_articulated_policy
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_go_to_point_clock/faithful_articulated_policy.npz
eval: 20/20
walk_around / run_around:
lane: mujoco_articulated_policy
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_run_around/faithful_articulated_policy.npz
eval: 20/20
pick_up:
lane: minicpm_vla_lora_manipulation
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_residual_512/minicpm_vla_lora_action_head.pt
eval: 3/3
find_and_eat_berry:
lane: minicpm_vla_lora_manipulation for GPU VLA proof
local demo fallback: fireboy-vla-physics/checkpoints/berry_eat_wide/state_policy.npz
eval: 3/3 MiniCPM LoRA proof, local fallback command test passes
MiniCPM-V skill-param router:
lane: minicpm_vla_skill_param_router
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
eval: 512/512 skill choices correct, param MAE 0.0170
dispatches to: walk_to, run_around, pick_up, find_and_eat_berry
MiniCPM-V LoRA skill-param router:
lane: minicpm_vla_lora_skill_param_router
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
adapter: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
eval: 256/256 skill choices correct, param MAE 0.0629
status: preserved, not promoted over frozen router
Toy V3 bridge verification:
"walk to the yellow marker with mujoco policy" -> success true, animation walk
"run around with mujoco policy" -> success true, animation run
"pick up the berry with mujoco policy" -> success true, grasped true
"go find berry and eat it with mujoco policy" -> success true, local fallback eaten true
Generated local bridge MP4s:
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_go_to_point.mp4
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_run_around.mp4
Source Visual Rig
The current Toy Room v3 Fire Boy rig we should preserve and match is:
fire-boy-rig/fire-boy-rigged-full.glb
This is the visual identity of Fire Boy. The physics body should be rebuilt to match this rig first.
Immediate Priority
Do this first:
fix Fire Boy physics body first
That means:
real Fire Boy GLB skeleton/proportions
-> matching MuJoCo/Newton articulated body
-> correct joints, link lengths, masses, collisions, contact sites
-> visual proof that physics Fire Boy resembles Toy Room v3 Fire Boy
We are intentionally leaving these for later:
use pretrained motion priors
generate successful rollouts
fine-tune MiniCPM-style VLA action model
Core Goal
The desired final model is:
image + language + robot state -> action
More specifically:
Toy Room camera image
+ user command
+ Fire Boy body state
-> Fire Boy action chunk
See:
minicpm-v-to-vla.md
physics-body-first.md
physics-body-fix-results.md