Instructions to use CoRL2026-CSI/IsaacLab-SO101-Phase2-DCT-SmolVLA-pick_place-80episode-10fps with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use CoRL2026-CSI/IsaacLab-SO101-Phase2-DCT-SmolVLA-pick_place-80episode-10fps with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=CoRL2026-CSI/IsaacLab-SO101-Phase2-DCT-SmolVLA-pick_place-80episode-10fps \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=CoRL2026-CSI/IsaacLab-SO101-Phase2-DCT-SmolVLA-pick_place-80episode-10fps - Notebooks
- Google Colab
- Kaggle
IsaacLab SO101 Phase2 DCT SmolVLA - pick_place 80ep 10fps
This repository contains the call-aligned DCT-tuned SmolVLA checkpoint prepared
for Method3 Phase2 acquisition on the 10fps future-action aligned Phase1
pick_place dataset.
Source Dataset
- Dataset:
CoRL2026-CSI/IsaacLab-SO101-Phase1-pick_place-80episode-10fps - Local training root:
/data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/derived_datasets/IsaacLab-SO101-Phase1-pick_place-80episode-10fps/dataset - Episodes: 80
- Frames: 25,204
- FPS: 10
- Downsample alignment: observations/timestamps use source frame
3k; actions use source framemin(3k + 2, episode_end), so the normal 30fps to 10fps mapping iss0 -> a2 -> s3.
Checkpoint Layout
pretrained_model/: LeRobot SmolVLA checkpoint directory.uvla_id_stats.json: U_VLA ID-distribution sidecar. Keep this file next topretrained_model/; ADC resolves it asPath(vla_checkpoint).parent / "uvla_id_stats.json".phase2_prepare/: simulator-free Method3 Phase2 prepare artifacts for Q1 local transport.
For ADC/Phase2, pass the checkpoint as:
--method3-phase2-vla-checkpoint /path/to/repo_snapshot/pretrained_model
or for prepare validation:
--vla-checkpoint /path/to/repo_snapshot/pretrained_model
Training Summary
- Policy: SmolVLA
- Base policy:
lerobot/smolvla_base - Job name:
smolvla_dct_pick_place_80ep_10fps_call_aligned_future_action - Steps: 2850
- Batch size: 16
- Seed: 1000
- DCT action horizon: 50
- DCT skill segments: 901
- Segment distribution: 59 episodes with 11 skills, 21 episodes with 12 skills
- Skill counts:
skill_0..skill_10each have 80 entries;skill_11has 21 entries - DCT parquet:
phase2_prepare/dct/CoRL2026-CSI__IsaacLab-SO101-Phase1-pick_place-80episode-10fps.skill_dct.parquet - Vector DB:
phase2_prepare/dct/skill_wise_vector_db.npzwith 901 entries
Training used the local LeRobot entrypoint:
/opt/isaaclab-env/bin/python -m lerobot.scripts.lerobot_train \
--policy.path=lerobot/smolvla_base \
--policy.push_to_hub=false \
--policy.device=cuda \
--dataset.repo_id=CoRL2026-CSI/IsaacLab-SO101-Phase1-pick_place-80episode-10fps \
--dataset.root=/data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/derived_datasets/IsaacLab-SO101-Phase1-pick_place-80episode-10fps/dataset \
--dataset.revision=v3.0 \
--dataset.video_backend=pyav \
--dataset.skill_dct_parquet=/data/vpraise-corl/workspace/SCRAPE-IsaacLab/results/method3_phase2_prepare/pick_place_80ep_10fps/source_session/dct/CoRL2026-CSI__IsaacLab-SO101-Phase1-pick_place-80episode-10fps.skill_dct.parquet \
--batch_size=16 \
--steps=2850 \
--seed=1000 \
--wandb.enable=false \
--rename_map='{"observation.images.left_wrist": "observation.images.camera1", "observation.images.top": "observation.images.camera2"}' \
--dataset.image_transforms.enable=true
Call-Aligned DCT Segmentation
Method3 Phase2 replay is indexed by set_skill_info() call ordinal. The DCT
parquet in this package is therefore segmented by skill call stamps instead of
only by skill.natural_language run-length. When goal pose columns are present,
the boundary key includes:
skill.natural_languageskill.typeskill.goal_position.robot_xyzrpyskill.goal_position.jointskill.goal_position.gripper
This prevents consecutive skill calls with the same natural-language label from being merged. The included prepare validation requires exact agreement between the subgoal buffer, DCT parquet, and vector DB.
U_VLA Stats
uvla_id_stats.json was computed simulator-free from the 10fps DCT skill dataset.
- R: 4
- Aggregation: mean
- Sigma: 0.5
- Samples: 100
- Mean: 255.6530
- Std: 176.9175
- P90: 535.7004
- P95: 559.8982
- P99: 619.3958
Notes
- IsaacLab simulator is not required to load this checkpoint or validate Phase2 prepare artifacts.
- The simulator is only needed when actually running Phase2 collection.
- Optimizer and scheduler
training_state/files are intentionally excluded because Phase2 inference does not need them.