--- library_name: lerobot base_model: lerobot/smolvla_base datasets: - CoRL2026-CSI/UR7e-CaP_arrange_block_100epi tags: - lerobot - smolvla - robotics - ur7e - code-as-policies - imitation-learning - CoRL2026 --- # SmolVLA UR7e Arrange Block 100epi (10 epochs) This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact. ## Model Details - **Policy:** SmolVLA - **Base checkpoint:** [`lerobot/smolvla_base`](https://huggingface.co/lerobot/smolvla_base) - **Training dataset:** [`CoRL2026-CSI/UR7e-CaP_arrange_block_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP_arrange_block_100epi) - **Training script:** `lerobot/scripts/train_smolvla_ur7e.sh` - **Checkpoint:** step `5520`, approximately `10.00` epochs - **Reported training loss at checkpoint:** `0.009` - **Resolved config:** [`train_config.json`](train_config.json) Related checkpoints from the same run: - [5ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_5ep) - [10ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep) ## Dataset | Key | Value | |---|---| | `Robot` | UR7e | | `Episodes` | 100 | | `Frames` | 141,253 | | `Tasks` | 1 | | `FPS` | 30 | | `Camera streams` | `observation.images.realsense_wrist`, `observation.images.realsense_topview` | | `Dataset state/action shape` | [7] / [7] | ## Reproduction The uploaded [`train_config.json`](train_config.json) is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection. | Key | Value | |---|---| | `script` | lerobot/scripts/train_smolvla_ur7e.sh | | `job_name` | smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 | | `output_dir` | /home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 | | `seed` | 1000 | | `launch` | single-process CUDA training via `python -m lerobot.scripts.lerobot_train` | | `checkpoint_step` | 5520 | | `checkpoint_epoch` | 10.00 | | `checkpoint_train_loss` | 0.009 | | `checkpoint_grad_norm` | 0.095 | | `checkpoint_lr` | 2.5e-06 | | `effective_batch` | 64 x 1 x 4 = 256 | Approximate script invocation: ```bash cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/UR7e-CaP_arrange_block_100epi" BATCH_SIZE="64" GRADIENT_ACCUMULATION_STEPS="4" STEPS="5520" NUM_WORKERS="4" DATALOADER_PREFETCH_FACTOR="1" CUDA_VISIBLE_DEVICES="0" NUM_GPUS="1" MIXED_PRECISION="bf16" SAVE_FREQ="2760" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla-ur7e" OMP_NUM_THREADS="4" MKL_NUM_THREADS="4" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" bash train_smolvla_ur7e.sh ``` ## Detailed Hyperparameters ### Script Defaults and Environment | Key | Value | |---|---| | `CONDA_ENV` | lerobot | | `POLICY_TYPE` | smolvla | | `POLICY_PATH` | lerobot/smolvla_base | | `DATASET_REPO_ID` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi | | `BATCH_SIZE` | 64 | | `GRADIENT_ACCUMULATION_STEPS` | 4 | | `STEPS` | 5520 | | `NUM_WORKERS` | 4 | | `DATALOADER_PREFETCH_FACTOR` | 1 | | `CUDA_VISIBLE_DEVICES` | 0 | | `NUM_GPUS` | 1 | | `MIXED_PRECISION` | bf16 | | `SAVE_FREQ` | 2760 | | `LOG_FREQ` | 10 | | `EVAL_FREQ` | 0 | | `WANDB_PROJECT` | lerobot-smolvla-ur7e | | `OMP_NUM_THREADS` | 4 | | `MKL_NUM_THREADS` | 4 | | `PYTORCH_CUDA_ALLOC_CONF` | expandable_segments:True | ### Training Loop and Dataloader | Key | Value | |---|---| | `steps` | 5520 | | `batch_size` | 64 | | `gradient_accumulation_steps` | 4 | | `num_workers` | 4 | | `dataloader_prefetch_factor` | 1 | | `dataloader_persistent_workers` | False | | `dataloader_pin_memory` | True | | `save_freq` | 2760 | | `log_freq` | 10 | | `eval_freq` | 0 | | `cudnn_deterministic` | False | | `use_policy_training_preset` | True | | `ddp_find_unused_parameters` | True | | `profile_timing` | False | ### Dataset Pipeline | Key | Value | |---|---| | `dataset.repo_id` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi | | `dataset.root` | `null` | | `dataset.episodes` | `null` | | `dataset.revision` | `null` | | `dataset.use_imagenet_stats` | True | | `dataset.video_backend` | torchcodec | | `dataset.streaming` | False | Image augmentation settings: ```json { "enable": true, "max_num_transforms": 2, "random_order": true, "tfs": { "brightness": { "weight": 1.0, "type": "ColorJitter", "kwargs": { "brightness": [ 0.8, 1.2 ] } }, "contrast": { "weight": 1.0, "type": "ColorJitter", "kwargs": { "contrast": [ 0.8, 1.2 ] } }, "saturation": { "weight": 1.0, "type": "ColorJitter", "kwargs": { "saturation": [ 0.5, 1.5 ] } }, "hue": { "weight": 1.0, "type": "ColorJitter", "kwargs": { "hue": [ -0.05, 0.05 ] } }, "sharpness": { "weight": 1.0, "type": "SharpnessJitter", "kwargs": { "sharpness": [ 0.5, 1.5 ] } }, "affine": { "weight": 1.0, "type": "RandomAffine", "kwargs": { "degrees": [ -5.0, 5.0 ], "translate": [ 0.05, 0.05 ] } } } } ``` Camera rename map: ```json { "observation.images.realsense_wrist": "observation.images.camera1", "observation.images.realsense_topview": "observation.images.camera2" } ``` ### Policy Configuration ```json { "type": "smolvla", "pretrained_path": "lerobot/smolvla_base", "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct", "load_vlm_weights": true, "num_vlm_layers": 16, "freeze_vision_encoder": true, "train_expert_only": true, "train_state_proj": true, "use_peft": false, "use_amp": false, "chunk_size": 50, "n_action_steps": 50, "num_steps": 10, "max_state_dim": 32, "max_action_dim": 32, "resize_imgs_with_padding": [ 512, 512 ], "tokenizer_max_length": 48, "attention_mode": "cross_attn", "pad_language_to": "max_length", "use_cache": true, "num_expert_layers": 0, "expert_width_multiplier": 0.75, "self_attn_every_n_layers": 2, "min_period": 0.004, "max_period": 4.0, "compile_model": false, "compile_mode": "max-autotune", "normalization_mapping": { "VISUAL": "IDENTITY", "STATE": "MEAN_STD", "ACTION": "MEAN_STD" }, "input_features": { "observation.state": { "type": "STATE", "shape": [ 6 ] }, "observation.images.camera1": { "type": "VISUAL", "shape": [ 3, 256, 256 ] }, "observation.images.camera2": { "type": "VISUAL", "shape": [ 3, 256, 256 ] }, "observation.images.camera3": { "type": "VISUAL", "shape": [ 3, 256, 256 ] } }, "output_features": { "action": { "type": "ACTION", "shape": [ 7 ] } } } ``` ### Optimizer ```json { "type": "adamw", "lr": 0.0001, "weight_decay": 1e-10, "grad_clip_norm": 10.0, "betas": [ 0.9, 0.95 ], "eps": 1e-08 } ``` ### Scheduler ```json { "type": "cosine_decay_with_warmup", "num_warmup_steps": 1000, "num_decay_steps": 30000, "peak_lr": 0.0001, "decay_lr": 2.5e-06 } ``` ### Logging ```json { "enable": true, "disable_artifact": false, "project": "lerobot-smolvla-ur7e", "entity": null, "notes": null, "run_id": "e1h98rll", "mode": null } ``` ## Usage Use this model as a LeRobot policy checkpoint: ```bash python -m lerobot.scripts.lerobot_eval \ --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep ``` For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path. ## Evaluation and Limitations This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository. The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG `training_state` files are not included; only the loadable `pretrained_model` artifact is uploaded. ## Provenance - VLM backbone: [`HuggingFaceTB/SmolVLM2-500M-Video-Instruct`](https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct) - Fine-tuning run: `smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552` - Source training script: `lerobot/scripts/train_smolvla_ur7e.sh`