IsaacLab SO101 Phase2 DCT SmolVLA - pick_place 80ep 10fps

This repository contains the call-aligned DCT-tuned SmolVLA checkpoint prepared for Method3 Phase2 acquisition on the 10fps future-action aligned Phase1 pick_place dataset.

Source Dataset

Dataset: CoRL2026-CSI/IsaacLab-SO101-Phase1-pick_place-80episode-10fps
Local training root: /data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/derived_datasets/IsaacLab-SO101-Phase1-pick_place-80episode-10fps/dataset
Episodes: 80
Frames: 25,204
FPS: 10
Downsample alignment: observations/timestamps use source frame 3k; actions use source frame min(3k + 2, episode_end), so the normal 30fps to 10fps mapping is s0 -> a2 -> s3.

Checkpoint Layout

pretrained_model/: LeRobot SmolVLA checkpoint directory.
uvla_id_stats.json: U_VLA ID-distribution sidecar. Keep this file next to pretrained_model/; ADC resolves it as Path(vla_checkpoint).parent / "uvla_id_stats.json".
phase2_prepare/: simulator-free Method3 Phase2 prepare artifacts for Q1 local transport.

For ADC/Phase2, pass the checkpoint as:

--method3-phase2-vla-checkpoint /path/to/repo_snapshot/pretrained_model

or for prepare validation:

--vla-checkpoint /path/to/repo_snapshot/pretrained_model

Training Summary

Policy: SmolVLA
Base policy: lerobot/smolvla_base
Job name: smolvla_dct_pick_place_80ep_10fps_call_aligned_future_action
Steps: 2850
Batch size: 16
Seed: 1000
DCT action horizon: 50
DCT skill segments: 901
Segment distribution: 59 episodes with 11 skills, 21 episodes with 12 skills
Skill counts: skill_0..skill_10 each have 80 entries; skill_11 has 21 entries
DCT parquet: phase2_prepare/dct/CoRL2026-CSI__IsaacLab-SO101-Phase1-pick_place-80episode-10fps.skill_dct.parquet
Vector DB: phase2_prepare/dct/skill_wise_vector_db.npz with 901 entries

Training used the local LeRobot entrypoint:

/opt/isaaclab-env/bin/python -m lerobot.scripts.lerobot_train \
  --policy.path=lerobot/smolvla_base \
  --policy.push_to_hub=false \
  --policy.device=cuda \
  --dataset.repo_id=CoRL2026-CSI/IsaacLab-SO101-Phase1-pick_place-80episode-10fps \
  --dataset.root=/data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/derived_datasets/IsaacLab-SO101-Phase1-pick_place-80episode-10fps/dataset \
  --dataset.revision=v3.0 \
  --dataset.video_backend=pyav \
  --dataset.skill_dct_parquet=/data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/method3_phase2_prepare/pick_place_80ep_10fps/source_session/dct/CoRL2026-CSI__IsaacLab-SO101-Phase1-pick_place-80episode-10fps.skill_dct.parquet \
  --batch_size=16 \
  --steps=2850 \
  --seed=1000 \
  --wandb.enable=false \
  --rename_map='{"observation.images.left_wrist": "observation.images.camera1", "observation.images.top": "observation.images.camera2"}' \
  --dataset.image_transforms.enable=true

Call-Aligned DCT Segmentation

Method3 Phase2 replay is indexed by set_skill_info() call ordinal. The DCT parquet in this package is therefore segmented by skill call stamps instead of only by skill.natural_language run-length. When goal pose columns are present, the boundary key includes:

skill.natural_language
skill.type
skill.goal_position.robot_xyzrpy
skill.goal_position.joint
skill.goal_position.gripper

This prevents consecutive skill calls with the same natural-language label from being merged. The included prepare validation requires exact agreement between the subgoal buffer, DCT parquet, and vector DB.

U_VLA Stats

uvla_id_stats.json was computed simulator-free from the 10fps DCT skill dataset.

R: 4
Aggregation: mean
Sigma: 0.5
Samples: 100
Mean: 255.6530
Std: 176.9175
P90: 535.7004
P95: 559.8982
P99: 619.3958

Notes

IsaacLab simulator is not required to load this checkpoint or validate Phase2 prepare artifacts.
The simulator is only needed when actually running Phase2 collection.
Optimizer and scheduler training_state/ files are intentionally excluded because Phase2 inference does not need them.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

CoRL2026-CSI
/

IsaacLab-SO101-Phase2-DCT-SmolVLA-pick_place-80episode-10fps