fireboy-minicpm-v-4-6-vla / docs /README.md

Upload Fire Boy MiniCPM-V VLA artifacts

0b07e71 verified 14 days ago

preview code

Raw

History Blame Contribute Delete

9.37 kB

Fireboy Training Policy VLA

This folder is the dedicated planning space for turning Fire Boy into a MiniCPM-V-driven vision-language-action pet.

Current Verified Stack

The current source of truth for policy routing is:

fireboy-vla-physics/policy_registry.json

Next VLA Lane: Skill + Parameters

Direct MiniCPM-V low-level navigation has failed so far, so the next robust VLA training lane predicts:

image + language + robot state -> skill_id + skill parameters

The first generated skill-param manifest is:

Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.jsonl
Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.summary.json
rows: 3072
skipped images: 0
skills:
  walk_to: 480
  run_around: 512
  pick_up: 1028
  find_and_eat_berry: 1052

The RunPod training launcher is:

bash fireboy-vla-physics/scripts/train_minicpm_vla_skill_param_head_runpod.sh

Latest RunPod output:

fireboy-vla-physics/build/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-skill-param-artifacts.tgz
GPU: NVIDIA A40
device: cuda
model: openbmb/MiniCPM-V-4.6
policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1
eval rows: 512
eval skill_accuracy: 1.0
eval param_mae: 0.017043352127075195
target_x MAE: 0.032305024564266205
target_y MAE: 0.0478343665599823
target_z MAE: 0.006528750993311405
radius MAE: 0.0038746832869946957
speed_hint MAE: 0.004880381282418966
object_is_berry MAE: 0.006836902815848589

This lane is accepted as the current command router. It dispatches into the existing registry policies for MP4-proven movement/manipulation.

Modal Live Inference

The promoted frozen router is now deployed as a Modal GPU endpoint:

Modal app: fireboy-vla-router
URL: https://sanjuhs123--fireboy-vla-router.modal.run
GPU: L40S
idle scaledown window: 60 seconds
checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt

Local Toy Room wiring:

TOYBOX_VLA_ROUTER_URL='https://sanjuhs123--fireboy-vla-router.modal.run' \
TOYBOX_VLA_ROUTER_ACTION=1 \
PORT=65373 PID_FILE=.toybox-65373.pid LOG_FILE=.toybox-65373.log ./start.sh

Verified through http://127.0.0.1:65373/api/pet-action:

walk to the yellow marker -> vla skill walk_to -> MuJoCo success true
run around -> vla skill run_around -> MuJoCo success true
pick up the berry -> vla skill pick_up -> MuJoCo success true
go find berry and eat it -> vla skill find_and_eat_berry -> MuJoCo success true

Proof note:

Fireboy-training-policy-vla/modal-inference-results-2026-06-15.md
Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png
Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json

The live endpoint reports both neural_skill and served skill. For blank-camera requests, explicit command/scene arbitration stabilizes the served skill and target params while keeping raw MiniCPM-V head output visible.

Repeat the final website/VLA smoke gate with:

PYTHONPATH=fireboy-vla-physics/src \
fireboy-vla-physics/.venv/bin/python \
fireboy-vla-physics/src/final_vla_demo_smoke.py \
  --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json

Latest smoke result: ok: true.

LoRA Router Lane

The first MiniCPM-V LoRA version of the skill-param router was also trained on RunPod:

GPU: NVIDIA A40
pod: xb6dv76ajw7tzq
status after artifact download: deleted
script: fireboy-vla-physics/scripts/train_minicpm_vla_lora_skill_param_head_runpod.sh
trainer: fireboy-vla-physics/src/train_minicpm_vla_lora_skill_param_head.py
seed: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
rows: 512
LoRA rank: 8
eval rows: 256
eval skill_accuracy: 1.0
eval param_mae: 0.06290113925933838

Artifacts:

Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/eval_minicpm_vla_lora_skill_param_head.json
Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-lora-skill-param-artifacts.tgz

Decision:

LoRA router training works and skill routing remains perfect.
Do not promote this checkpoint over the frozen router yet because its target
parameter MAE is worse: 0.0629 vs 0.0170.

Validate it with:

PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/validate_policy_registry.py

Latest validation:

checked_paths: 31
checked_paths after router/LoRA-router registration: 49
missing_count: 0
ok: true

Visual proof page:

http://127.0.0.1:65373/fireboy-policy-gallery

Saved screenshots:

fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-desktop.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-mobile-viewport.png
fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-vla-router.png

Build a portable proof bundle with:

PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/build_policy_proof_bundle.py

Latest bundle:

fireboy-vla-physics/build/fireboy-policy-proof-bundle/
fireboy-vla-physics/build/fireboy-policy-proof-bundle.tgz
copied proof/training files: 25
copied proof/training files after router/LoRA-router registration: 33
copied proof/training files after final smoke proof registration: 35
checkpoint/archive references after router/LoRA-router registration: 21

Verified command paths:

walk_to / run_to:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_go_to_point_clock/faithful_articulated_policy.npz
  eval: 20/20

walk_around / run_around:
  lane: mujoco_articulated_policy
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_run_around/faithful_articulated_policy.npz
  eval: 20/20

pick_up:
  lane: minicpm_vla_lora_manipulation
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_residual_512/minicpm_vla_lora_action_head.pt
  eval: 3/3

find_and_eat_berry:
  lane: minicpm_vla_lora_manipulation for GPU VLA proof
  local demo fallback: fireboy-vla-physics/checkpoints/berry_eat_wide/state_policy.npz
  eval: 3/3 MiniCPM LoRA proof, local fallback command test passes

MiniCPM-V skill-param router:
  lane: minicpm_vla_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt
  eval: 512/512 skill choices correct, param MAE 0.0170
  dispatches to: walk_to, run_around, pick_up, find_and_eat_berry

MiniCPM-V LoRA skill-param router:
  lane: minicpm_vla_lora_skill_param_router
  checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt
  adapter: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/
  eval: 256/256 skill choices correct, param MAE 0.0629
  status: preserved, not promoted over frozen router

Toy V3 bridge verification:

"walk to the yellow marker with mujoco policy" -> success true, animation walk
"run around with mujoco policy" -> success true, animation run
"pick up the berry with mujoco policy" -> success true, grasped true
"go find berry and eat it with mujoco policy" -> success true, local fallback eaten true

Generated local bridge MP4s:

fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_go_to_point.mp4
fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_run_around.mp4

Source Visual Rig

The current Toy Room v3 Fire Boy rig we should preserve and match is:

fire-boy-rig/fire-boy-rigged-full.glb

This is the visual identity of Fire Boy. The physics body should be rebuilt to match this rig first.

Immediate Priority

Do this first:

fix Fire Boy physics body first

That means:

real Fire Boy GLB skeleton/proportions
-> matching MuJoCo/Newton articulated body
-> correct joints, link lengths, masses, collisions, contact sites
-> visual proof that physics Fire Boy resembles Toy Room v3 Fire Boy

We are intentionally leaving these for later:

use pretrained motion priors
generate successful rollouts
fine-tune MiniCPM-style VLA action model

Core Goal

The desired final model is:

image + language + robot state -> action

More specifically:

Toy Room camera image
+ user command
+ Fire Boy body state
-> Fire Boy action chunk

See:

minicpm-v-to-vla.md
physics-body-first.md
physics-body-fix-results.md