sanjuhs's picture
Duplicate from build-small-hackathon/fireboy-minicpm-v-4-6-vla
5bd41a1
|
Raw
History Blame Contribute Delete
9.37 kB

Fireboy Training Policy VLA

This folder is the dedicated planning space for turning Fire Boy into a MiniCPM-V-driven vision-language-action pet.

Current Verified Stack

The current source of truth for policy routing is:

fireboy-vla-physics/policy_registry.json

Next VLA Lane: Skill + Parameters

Direct MiniCPM-V low-level navigation has failed so far, so the next robust VLA training lane predicts:

image + language + robot state -> skill_id + skill parameters

The first generated skill-param manifest is:

Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.jsonl
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.summary.json
rows: 3072
skipped images: 0
skills:
  walk_to: 480
  run_around: 512
  pick_up: 1028
  find_and_eat_berry: 1052

The RunPod training launcher is:

bash fireboy-vla-physics/scripts/train_minicpm_vla_skill_param_head_runpod.sh

Latest RunPod output:

fireboy-vla-physics/build/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-skill-param-artifacts.tgz
GPU: NVIDIA A40
device: cuda
model: openbmb/MiniCPM-V-4.6
policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
eval rows: 512
eval skill_accuracy: 1.0
eval param_mae: 0.017043352127075195
target_x MAE: 0.032305024564266205
target_y MAE: 0.0478343665599823
target_z MAE: 0.006528750993311405
radius MAE: 0.0038746832869946957
speed_hint MAE: 0.004880381282418966
object_is_berry MAE: 0.006836902815848589

This lane is accepted as the current command router. It dispatches into the existing registry policies for MP4-proven movement/manipulation.

Modal Live Inference

The promoted frozen router is now deployed as a Modal GPU endpoint:

Modal app: fireboy-vla-router
URL: https://sanjuhs123--fireboy-vla-router.modal.run
GPU: L40S
idle scaledown window: 60 seconds
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt

Local Toy Room wiring:

TOYBOX_VLA_ROUTER_URL='https://sanjuhs123--fireboy-vla-router.modal.run' \
TOYBOX_VLA_ROUTER_ACTION=1 \
PORT=65373 PID_FILE=.toybox-65373.pid LOG_FILE=.toybox-65373.log ./start.sh

Verified through http://127.0.0.1:65373/api/pet-action:

walk to the yellow marker -> vla skill walk_to -> MuJoCo success true
run around -> vla skill run_around -> MuJoCo success true
pick up the berry -> vla skill pick_up -> MuJoCo success true
go find berry and eat it -> vla skill find_and_eat_berry -> MuJoCo success true

Proof note:

Fireboy-training-policy-vla/modal-inference-results-2026-06-15.md
Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json

The live endpoint reports both neural_skill and served skill. For blank-camera requests, explicit command/scene arbitration stabilizes the served skill and target params while keeping raw MiniCPM-V head output visible.

Repeat the final website/VLA smoke gate with:

PYTHONPATH=fireboy-vla-physics/src \
fireboy-vla-physics/.venv/bin/python \
fireboy-vla-physics/src/final_vla_demo_smoke.py \
  --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json

Latest smoke result: ok: true.

LoRA Router Lane

The first MiniCPM-V LoRA version of the skill-param router was also trained on RunPod:

GPU: NVIDIA A40
pod: xb6dv76ajw7tzq
status after artifact download: deleted
script: fireboy-vla-physics/scripts/train_minicpm_vla_lora_skill_param_head_runpod.sh
trainer: fireboy-vla-physics/src/train_minicpm_vla_lora_skill_param_head.py
seed: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
rows: 512
LoRA rank: 8
eval rows: 256
eval skill_accuracy: 1.0
eval param_mae: 0.06290113925933838

Artifacts:

Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/eval_minicpm_vla_lora_skill_param_head.json
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-lora-skill-param-artifacts.tgz

Decision:

LoRA router training works and skill routing remains perfect.
Do not promote this checkpoint over the frozen router yet because its target
parameter MAE is worse: 0.0629 vs 0.0170.

Validate it with:

PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/validate_policy_registry.py

Latest validation:

checked_paths: 31
checked_paths after router/LoRA-router registration: 49
missing_count: 0
ok: true

Visual proof page:

http://127.0.0.1:65373/fireboy-policy-gallery

Saved screenshots:

fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-desktop.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-mobile-viewport.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-vla-router.png

Build a portable proof bundle with:

PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/build_policy_proof_bundle.py

Latest bundle:

fireboy-vla-physics/build/fireboy-policy-proof-bundle/
fireboy-vla-physics/build/fireboy-policy-proof-bundle.tgz
copied proof/training files: 25
copied proof/training files after router/LoRA-router registration: 33
copied proof/training files after final smoke proof registration: 35
checkpoint/archive references after router/LoRA-router registration: 21

Verified command paths:

walk_to / run_to:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_go_to_point_clock/faithful_articulated_policy.npz
  eval: 20/20

walk_around / run_around:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_run_around/faithful_articulated_policy.npz
  eval: 20/20

pick_up:
  lane: minicpm_vla_lora_manipulation
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_residual_512/minicpm_vla_lora_action_head.pt
  eval: 3/3

find_and_eat_berry:
  lane: minicpm_vla_lora_manipulation for GPU VLA proof
  local demo fallback: fireboy-vla-physics/checkpoints/berry_eat_wide/state_policy.npz
  eval: 3/3 MiniCPM LoRA proof, local fallback command test passes

MiniCPM-V skill-param router:
  lane: minicpm_vla_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
  eval: 512/512 skill choices correct, param MAE 0.0170
  dispatches to: walk_to, run_around, pick_up, find_and_eat_berry

MiniCPM-V LoRA skill-param router:
  lane: minicpm_vla_lora_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
  adapter: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
  eval: 256/256 skill choices correct, param MAE 0.0629
  status: preserved, not promoted over frozen router

Toy V3 bridge verification:

"walk to the yellow marker with mujoco policy" -> success true, animation walk
"run around with mujoco policy" -> success true, animation run
"pick up the berry with mujoco policy" -> success true, grasped true
"go find berry and eat it with mujoco policy" -> success true, local fallback eaten true

Generated local bridge MP4s:

fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_go_to_point.mp4
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_run_around.mp4

Source Visual Rig

The current Toy Room v3 Fire Boy rig we should preserve and match is:

fire-boy-rig/fire-boy-rigged-full.glb

This is the visual identity of Fire Boy. The physics body should be rebuilt to match this rig first.

Immediate Priority

Do this first:

fix Fire Boy physics body first

That means:

real Fire Boy GLB skeleton/proportions
-> matching MuJoCo/Newton articulated body
-> correct joints, link lengths, masses, collisions, contact sites
-> visual proof that physics Fire Boy resembles Toy Room v3 Fire Boy

We are intentionally leaving these for later:

use pretrained motion priors
generate successful rollouts
fine-tune MiniCPM-style VLA action model

Core Goal

The desired final model is:

image + language + robot state -> action

More specifically:

Toy Room camera image
+ user command
+ Fire Boy body state
-> Fire Boy action chunk

See:

minicpm-v-to-vla.md
physics-body-first.md
physics-body-fix-results.md