Instructions to use CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep with LeRobot:
# See https://github.com/huggingface/lerobot?tab=readme-ov-file#installation for more details git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e .[smolvla]
# Launch finetuning on your dataset python lerobot/scripts/train.py \ --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep \ --dataset.repo_id=lerobot/svla_so101_pickplace \ --batch_size=64 \ --steps=20000 \ --output_dir=outputs/train/my_smolvla \ --job_name=my_smolvla_training \ --policy.device=cuda \ --wandb.enable=true
# Run the policy using the record function python -m lerobot.record \ --robot.type=so101_follower \ --robot.port=/dev/ttyACM0 \ # <- Use your port --robot.id=my_blue_follower_arm \ # <- Use your robot id --robot.cameras="{ front: {type: opencv, index_or_path: 8, width: 640, height: 480, fps: 30}}" \ # <- Use your cameras --dataset.single_task="Grasp a lego block and put it in the bin." \ # <- Use the same task description you used in your dataset recording --dataset.repo_id=HF_USER/dataset_name \ # <- This will be the dataset name on HF Hub --dataset.episode_time_s=50 \ --dataset.num_episodes=10 \ --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep - Notebooks
- Google Colab
- Kaggle
SmolVLA UR7e Arrange Block 100epi (10 epochs)
This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.
Model Details
- Policy: SmolVLA
- Base checkpoint:
lerobot/smolvla_base - Training dataset:
CoRL2026-CSI/UR7e-CaP_arrange_block_100epi - Training script:
lerobot/scripts/train_smolvla_ur7e.sh - Checkpoint: step
5520, approximately10.00epochs - Reported training loss at checkpoint:
0.009 - Resolved config:
train_config.json
Related checkpoints from the same run:
Dataset
| Key | Value |
|---|---|
Robot |
UR7e |
Episodes |
100 |
Frames |
141,253 |
Tasks |
1 |
FPS |
30 |
Camera streams |
observation.images.realsense_wrist, observation.images.realsense_topview |
Dataset state/action shape |
[7] / [7] |
Reproduction
The uploaded train_config.json is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.
| Key | Value |
|---|---|
script |
lerobot/scripts/train_smolvla_ur7e.sh |
job_name |
smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
output_dir |
/home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
seed |
1000 |
launch |
single-process CUDA training via python -m lerobot.scripts.lerobot_train |
checkpoint_step |
5520 |
checkpoint_epoch |
10.00 |
checkpoint_train_loss |
0.009 |
checkpoint_grad_norm |
0.095 |
checkpoint_lr |
2.5e-06 |
effective_batch |
64 x 1 x 4 = 256 |
Approximate script invocation:
cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/UR7e-CaP_arrange_block_100epi" BATCH_SIZE="64" GRADIENT_ACCUMULATION_STEPS="4" STEPS="5520" NUM_WORKERS="4" DATALOADER_PREFETCH_FACTOR="1" CUDA_VISIBLE_DEVICES="0" NUM_GPUS="1" MIXED_PRECISION="bf16" SAVE_FREQ="2760" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla-ur7e" OMP_NUM_THREADS="4" MKL_NUM_THREADS="4" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" bash train_smolvla_ur7e.sh
Detailed Hyperparameters
Script Defaults and Environment
| Key | Value |
|---|---|
CONDA_ENV |
lerobot |
POLICY_TYPE |
smolvla |
POLICY_PATH |
lerobot/smolvla_base |
DATASET_REPO_ID |
CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
BATCH_SIZE |
64 |
GRADIENT_ACCUMULATION_STEPS |
4 |
STEPS |
5520 |
NUM_WORKERS |
4 |
DATALOADER_PREFETCH_FACTOR |
1 |
CUDA_VISIBLE_DEVICES |
0 |
NUM_GPUS |
1 |
MIXED_PRECISION |
bf16 |
SAVE_FREQ |
2760 |
LOG_FREQ |
10 |
EVAL_FREQ |
0 |
WANDB_PROJECT |
lerobot-smolvla-ur7e |
OMP_NUM_THREADS |
4 |
MKL_NUM_THREADS |
4 |
PYTORCH_CUDA_ALLOC_CONF |
expandable_segments:True |
Training Loop and Dataloader
| Key | Value |
|---|---|
steps |
5520 |
batch_size |
64 |
gradient_accumulation_steps |
4 |
num_workers |
4 |
dataloader_prefetch_factor |
1 |
dataloader_persistent_workers |
False |
dataloader_pin_memory |
True |
save_freq |
2760 |
log_freq |
10 |
eval_freq |
0 |
cudnn_deterministic |
False |
use_policy_training_preset |
True |
ddp_find_unused_parameters |
True |
profile_timing |
False |
Dataset Pipeline
| Key | Value |
|---|---|
dataset.repo_id |
CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
dataset.root |
null |
dataset.episodes |
null |
dataset.revision |
null |
dataset.use_imagenet_stats |
True |
dataset.video_backend |
torchcodec |
dataset.streaming |
False |
Image augmentation settings:
{
"enable": true,
"max_num_transforms": 2,
"random_order": true,
"tfs": {
"brightness": {
"weight": 1.0,
"type": "ColorJitter",
"kwargs": {
"brightness": [
0.8,
1.2
]
}
},
"contrast": {
"weight": 1.0,
"type": "ColorJitter",
"kwargs": {
"contrast": [
0.8,
1.2
]
}
},
"saturation": {
"weight": 1.0,
"type": "ColorJitter",
"kwargs": {
"saturation": [
0.5,
1.5
]
}
},
"hue": {
"weight": 1.0,
"type": "ColorJitter",
"kwargs": {
"hue": [
-0.05,
0.05
]
}
},
"sharpness": {
"weight": 1.0,
"type": "SharpnessJitter",
"kwargs": {
"sharpness": [
0.5,
1.5
]
}
},
"affine": {
"weight": 1.0,
"type": "RandomAffine",
"kwargs": {
"degrees": [
-5.0,
5.0
],
"translate": [
0.05,
0.05
]
}
}
}
}
Camera rename map:
{
"observation.images.realsense_wrist": "observation.images.camera1",
"observation.images.realsense_topview": "observation.images.camera2"
}
Policy Configuration
{
"type": "smolvla",
"pretrained_path": "lerobot/smolvla_base",
"vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
"load_vlm_weights": true,
"num_vlm_layers": 16,
"freeze_vision_encoder": true,
"train_expert_only": true,
"train_state_proj": true,
"use_peft": false,
"use_amp": false,
"chunk_size": 50,
"n_action_steps": 50,
"num_steps": 10,
"max_state_dim": 32,
"max_action_dim": 32,
"resize_imgs_with_padding": [
512,
512
],
"tokenizer_max_length": 48,
"attention_mode": "cross_attn",
"pad_language_to": "max_length",
"use_cache": true,
"num_expert_layers": 0,
"expert_width_multiplier": 0.75,
"self_attn_every_n_layers": 2,
"min_period": 0.004,
"max_period": 4.0,
"compile_model": false,
"compile_mode": "max-autotune",
"normalization_mapping": {
"VISUAL": "IDENTITY",
"STATE": "MEAN_STD",
"ACTION": "MEAN_STD"
},
"input_features": {
"observation.state": {
"type": "STATE",
"shape": [
6
]
},
"observation.images.camera1": {
"type": "VISUAL",
"shape": [
3,
256,
256
]
},
"observation.images.camera2": {
"type": "VISUAL",
"shape": [
3,
256,
256
]
},
"observation.images.camera3": {
"type": "VISUAL",
"shape": [
3,
256,
256
]
}
},
"output_features": {
"action": {
"type": "ACTION",
"shape": [
7
]
}
}
}
Optimizer
{
"type": "adamw",
"lr": 0.0001,
"weight_decay": 1e-10,
"grad_clip_norm": 10.0,
"betas": [
0.9,
0.95
],
"eps": 1e-08
}
Scheduler
{
"type": "cosine_decay_with_warmup",
"num_warmup_steps": 1000,
"num_decay_steps": 30000,
"peak_lr": 0.0001,
"decay_lr": 2.5e-06
}
Logging
{
"enable": true,
"disable_artifact": false,
"project": "lerobot-smolvla-ur7e",
"entity": null,
"notes": null,
"run_id": "e1h98rll",
"mode": null
}
Usage
Use this model as a LeRobot policy checkpoint:
python -m lerobot.scripts.lerobot_eval \
--policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep
For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.
Evaluation and Limitations
This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.
The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG training_state files are not included; only the loadable pretrained_model artifact is uploaded.
Provenance
- VLM backbone:
HuggingFaceTB/SmolVLM2-500M-Video-Instruct - Fine-tuning run:
smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 - Source training script:
lerobot/scripts/train_smolvla_ur7e.sh
- Downloads last month
- 3
Model tree for CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep
Base model
lerobot/smolvla_base