| # Fireboy Training Policy VLA |
|
|
| This folder is the dedicated planning space for turning Fire Boy into a |
| MiniCPM-V-driven vision-language-action pet. |
|
|
| ## Current Verified Stack |
|
|
| The current source of truth for policy routing is: |
|
|
| ```text |
| fireboy-vla-physics/policy_registry.json |
| ``` |
|
|
| ## Next VLA Lane: Skill + Parameters |
|
|
| Direct MiniCPM-V low-level navigation has failed so far, so the next robust VLA |
| training lane predicts: |
|
|
| ```text |
| image + language + robot state -> skill_id + skill parameters |
| ``` |
|
|
| The first generated skill-param manifest is: |
|
|
| ```text |
| Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.jsonl |
| Fireboy-training-policy-vla/vla-rollouts/vla_skill_params/fireboy_vla_skill_params_allskill_3072.summary.json |
| rows: 3072 |
| skipped images: 0 |
| skills: |
| walk_to: 480 |
| run_around: 512 |
| pick_up: 1028 |
| find_and_eat_berry: 1052 |
| ``` |
|
|
| The RunPod training launcher is: |
|
|
| ```bash |
| bash fireboy-vla-physics/scripts/train_minicpm_vla_skill_param_head_runpod.sh |
| ``` |
|
|
| Latest RunPod output: |
|
|
| ```text |
| fireboy-vla-physics/build/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt |
| Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/ |
| Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-skill-param-artifacts.tgz |
| GPU: NVIDIA A40 |
| device: cuda |
| model: openbmb/MiniCPM-V-4.6 |
| policy_kind: minicpm_vla_frozen_encoder_skill_param_head_v1 |
| eval rows: 512 |
| eval skill_accuracy: 1.0 |
| eval param_mae: 0.017043352127075195 |
| target_x MAE: 0.032305024564266205 |
| target_y MAE: 0.0478343665599823 |
| target_z MAE: 0.006528750993311405 |
| radius MAE: 0.0038746832869946957 |
| speed_hint MAE: 0.004880381282418966 |
| object_is_berry MAE: 0.006836902815848589 |
| ``` |
|
|
| This lane is accepted as the current command router. It dispatches into the |
| existing registry policies for MP4-proven movement/manipulation. |
|
|
| ## Modal Live Inference |
|
|
| The promoted frozen router is now deployed as a Modal GPU endpoint: |
|
|
| ```text |
| Modal app: fireboy-vla-router |
| URL: https://sanjuhs123--fireboy-vla-router.modal.run |
| GPU: L40S |
| idle scaledown window: 60 seconds |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt |
| ``` |
|
|
| Local Toy Room wiring: |
|
|
| ```bash |
| TOYBOX_VLA_ROUTER_URL='https://sanjuhs123--fireboy-vla-router.modal.run' \ |
| TOYBOX_VLA_ROUTER_ACTION=1 \ |
| PORT=65373 PID_FILE=.toybox-65373.pid LOG_FILE=.toybox-65373.log ./start.sh |
| ``` |
|
|
| Verified through `http://127.0.0.1:65373/api/pet-action`: |
|
|
| ```text |
| walk to the yellow marker -> vla skill walk_to -> MuJoCo success true |
| run around -> vla skill run_around -> MuJoCo success true |
| pick up the berry -> vla skill pick_up -> MuJoCo success true |
| go find berry and eat it -> vla skill find_and_eat_berry -> MuJoCo success true |
| ``` |
|
|
| Proof note: |
|
|
| ```text |
| Fireboy-training-policy-vla/modal-inference-results-2026-06-15.md |
| Fireboy-training-policy-vla/proofs/modal-vla-router-policy-gallery.png |
| Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json |
| ``` |
|
|
| The live endpoint reports both `neural_skill` and served `skill`. For |
| blank-camera requests, explicit command/scene arbitration stabilizes the served |
| skill and target params while keeping raw MiniCPM-V head output visible. |
|
|
| Repeat the final website/VLA smoke gate with: |
|
|
| ```bash |
| PYTHONPATH=fireboy-vla-physics/src \ |
| fireboy-vla-physics/.venv/bin/python \ |
| fireboy-vla-physics/src/final_vla_demo_smoke.py \ |
| --out Fireboy-training-policy-vla/proofs/final-vla-demo-smoke.json |
| ``` |
|
|
| Latest smoke result: `ok: true`. |
|
|
| ## LoRA Router Lane |
|
|
| The first MiniCPM-V LoRA version of the skill-param router was also trained on |
| RunPod: |
|
|
| ```text |
| GPU: NVIDIA A40 |
| pod: xb6dv76ajw7tzq |
| status after artifact download: deleted |
| script: fireboy-vla-physics/scripts/train_minicpm_vla_lora_skill_param_head_runpod.sh |
| trainer: fireboy-vla-physics/src/train_minicpm_vla_lora_skill_param_head.py |
| seed: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt |
| rows: 512 |
| LoRA rank: 8 |
| eval rows: 256 |
| eval skill_accuracy: 1.0 |
| eval param_mae: 0.06290113925933838 |
| ``` |
|
|
| Artifacts: |
|
|
| ```text |
| Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt |
| Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/ |
| Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/eval_minicpm_vla_lora_skill_param_head.json |
| Fireboy-training-policy-vla/runpod-artifacts/fireboy-minicpm-lora-skill-param-artifacts.tgz |
| ``` |
|
|
| Decision: |
|
|
| ```text |
| LoRA router training works and skill routing remains perfect. |
| Do not promote this checkpoint over the frozen router yet because its target |
| parameter MAE is worse: 0.0629 vs 0.0170. |
| ``` |
|
|
| Validate it with: |
|
|
| ```bash |
| PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/validate_policy_registry.py |
| ``` |
|
|
| Latest validation: |
|
|
| ```text |
| checked_paths: 31 |
| checked_paths after router/LoRA-router registration: 49 |
| missing_count: 0 |
| ok: true |
| ``` |
|
|
| Visual proof page: |
|
|
| ```text |
| http://127.0.0.1:65373/fireboy-policy-gallery |
| ``` |
|
|
| Saved screenshots: |
|
|
| ```text |
| fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-desktop.png |
| fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-mobile-viewport.png |
| fireboy-vla-physics/build/proof-gallery-screenshots/fireboy-policy-gallery-vla-router.png |
| ``` |
|
|
| Build a portable proof bundle with: |
|
|
| ```bash |
| PYTHONPATH=fireboy-vla-physics/src fireboy-vla-physics/.venv/bin/python fireboy-vla-physics/src/build_policy_proof_bundle.py |
| ``` |
|
|
| Latest bundle: |
|
|
| ```text |
| fireboy-vla-physics/build/fireboy-policy-proof-bundle/ |
| fireboy-vla-physics/build/fireboy-policy-proof-bundle.tgz |
| copied proof/training files: 25 |
| copied proof/training files after router/LoRA-router registration: 33 |
| copied proof/training files after final smoke proof registration: 35 |
| checkpoint/archive references after router/LoRA-router registration: 21 |
| ``` |
|
|
| Verified command paths: |
|
|
| ```text |
| walk_to / run_to: |
| lane: mujoco_articulated_policy |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_go_to_point_clock/faithful_articulated_policy.npz |
| eval: 20/20 |
| |
| walk_around / run_around: |
| lane: mujoco_articulated_policy |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_articulated_run_around/faithful_articulated_policy.npz |
| eval: 20/20 |
| |
| pick_up: |
| lane: minicpm_vla_lora_manipulation |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_residual_512/minicpm_vla_lora_action_head.pt |
| eval: 3/3 |
| |
| find_and_eat_berry: |
| lane: minicpm_vla_lora_manipulation for GPU VLA proof |
| local demo fallback: fireboy-vla-physics/checkpoints/berry_eat_wide/state_policy.npz |
| eval: 3/3 MiniCPM LoRA proof, local fallback command test passes |
| |
| MiniCPM-V skill-param router: |
| lane: minicpm_vla_skill_param_router |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_skill_param_head/minicpm_vla_skill_param_head.pt |
| eval: 512/512 skill choices correct, param MAE 0.0170 |
| dispatches to: walk_to, run_around, pick_up, find_and_eat_berry |
| |
| MiniCPM-V LoRA skill-param router: |
| lane: minicpm_vla_lora_skill_param_router |
| checkpoint: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/minicpm_vla_lora_skill_param_head.pt |
| adapter: Fireboy-training-policy-vla/runpod-artifacts/checkpoints/fireboy_minicpm_vla_lora_skill_param_head/lora_adapter/ |
| eval: 256/256 skill choices correct, param MAE 0.0629 |
| status: preserved, not promoted over frozen router |
| ``` |
|
|
| Toy V3 bridge verification: |
|
|
| ```text |
| "walk to the yellow marker with mujoco policy" -> success true, animation walk |
| "run around with mujoco policy" -> success true, animation run |
| "pick up the berry with mujoco policy" -> success true, grasped true |
| "go find berry and eat it with mujoco policy" -> success true, local fallback eaten true |
| ``` |
|
|
| Generated local bridge MP4s: |
|
|
| ```text |
| fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_go_to_point.mp4 |
| fireboy-vla-physics/build/toy-v3-policy/articulated/faithful_learned_run_around.mp4 |
| ``` |
|
|
| ## Source Visual Rig |
|
|
| The current Toy Room v3 Fire Boy rig we should preserve and match is: |
|
|
| ```text |
| fire-boy-rig/fire-boy-rigged-full.glb |
| ``` |
|
|
| This is the visual identity of Fire Boy. The physics body should be rebuilt to |
| match this rig first. |
|
|
| ## Immediate Priority |
|
|
| Do this first: |
|
|
| ```text |
| fix Fire Boy physics body first |
| ``` |
|
|
| That means: |
|
|
| ```text |
| real Fire Boy GLB skeleton/proportions |
| -> matching MuJoCo/Newton articulated body |
| -> correct joints, link lengths, masses, collisions, contact sites |
| -> visual proof that physics Fire Boy resembles Toy Room v3 Fire Boy |
| ``` |
|
|
| We are intentionally leaving these for later: |
|
|
| ```text |
| use pretrained motion priors |
| generate successful rollouts |
| fine-tune MiniCPM-style VLA action model |
| ``` |
|
|
| ## Core Goal |
|
|
| The desired final model is: |
|
|
| ```text |
| image + language + robot state -> action |
| ``` |
|
|
| More specifically: |
|
|
| ```text |
| Toy Room camera image |
| + user command |
| + Fire Boy body state |
| -> Fire Boy action chunk |
| ``` |
|
|
| See: |
|
|
| ```text |
| minicpm-v-to-vla.md |
| physics-body-first.md |
| physics-body-fix-results.md |
| ``` |
|
|