SmolVLA UR7e Arrange Block 100epi (10 epochs)

This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.

Model Details

Related checkpoints from the same run:

Dataset

Key Value
Robot UR7e
Episodes 100
Frames 141,253
Tasks 1
FPS 30
Camera streams observation.images.realsense_wrist, observation.images.realsense_topview
Dataset state/action shape [7] / [7]

Reproduction

The uploaded train_config.json is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.

Key Value
script lerobot/scripts/train_smolvla_ur7e.sh
job_name smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552
output_dir /home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552
seed 1000
launch single-process CUDA training via python -m lerobot.scripts.lerobot_train
checkpoint_step 5520
checkpoint_epoch 10.00
checkpoint_train_loss 0.009
checkpoint_grad_norm 0.095
checkpoint_lr 2.5e-06
effective_batch 64 x 1 x 4 = 256

Approximate script invocation:

cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/UR7e-CaP_arrange_block_100epi" BATCH_SIZE="64" GRADIENT_ACCUMULATION_STEPS="4" STEPS="5520" NUM_WORKERS="4" DATALOADER_PREFETCH_FACTOR="1" CUDA_VISIBLE_DEVICES="0" NUM_GPUS="1" MIXED_PRECISION="bf16" SAVE_FREQ="2760" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla-ur7e" OMP_NUM_THREADS="4" MKL_NUM_THREADS="4" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" bash train_smolvla_ur7e.sh

Detailed Hyperparameters

Script Defaults and Environment

Key Value
CONDA_ENV lerobot
POLICY_TYPE smolvla
POLICY_PATH lerobot/smolvla_base
DATASET_REPO_ID CoRL2026-CSI/UR7e-CaP_arrange_block_100epi
BATCH_SIZE 64
GRADIENT_ACCUMULATION_STEPS 4
STEPS 5520
NUM_WORKERS 4
DATALOADER_PREFETCH_FACTOR 1
CUDA_VISIBLE_DEVICES 0
NUM_GPUS 1
MIXED_PRECISION bf16
SAVE_FREQ 2760
LOG_FREQ 10
EVAL_FREQ 0
WANDB_PROJECT lerobot-smolvla-ur7e
OMP_NUM_THREADS 4
MKL_NUM_THREADS 4
PYTORCH_CUDA_ALLOC_CONF expandable_segments:True

Training Loop and Dataloader

Key Value
steps 5520
batch_size 64
gradient_accumulation_steps 4
num_workers 4
dataloader_prefetch_factor 1
dataloader_persistent_workers False
dataloader_pin_memory True
save_freq 2760
log_freq 10
eval_freq 0
cudnn_deterministic False
use_policy_training_preset True
ddp_find_unused_parameters True
profile_timing False

Dataset Pipeline

Key Value
dataset.repo_id CoRL2026-CSI/UR7e-CaP_arrange_block_100epi
dataset.root null
dataset.episodes null
dataset.revision null
dataset.use_imagenet_stats True
dataset.video_backend torchcodec
dataset.streaming False

Image augmentation settings:

{
  "enable": true,
  "max_num_transforms": 2,
  "random_order": true,
  "tfs": {
    "brightness": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "brightness": [
          0.8,
          1.2
        ]
      }
    },
    "contrast": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "contrast": [
          0.8,
          1.2
        ]
      }
    },
    "saturation": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "saturation": [
          0.5,
          1.5
        ]
      }
    },
    "hue": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "hue": [
          -0.05,
          0.05
        ]
      }
    },
    "sharpness": {
      "weight": 1.0,
      "type": "SharpnessJitter",
      "kwargs": {
        "sharpness": [
          0.5,
          1.5
        ]
      }
    },
    "affine": {
      "weight": 1.0,
      "type": "RandomAffine",
      "kwargs": {
        "degrees": [
          -5.0,
          5.0
        ],
        "translate": [
          0.05,
          0.05
        ]
      }
    }
  }
}

Camera rename map:

{
  "observation.images.realsense_wrist": "observation.images.camera1",
  "observation.images.realsense_topview": "observation.images.camera2"
}

Policy Configuration

{
  "type": "smolvla",
  "pretrained_path": "lerobot/smolvla_base",
  "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
  "load_vlm_weights": true,
  "num_vlm_layers": 16,
  "freeze_vision_encoder": true,
  "train_expert_only": true,
  "train_state_proj": true,
  "use_peft": false,
  "use_amp": false,
  "chunk_size": 50,
  "n_action_steps": 50,
  "num_steps": 10,
  "max_state_dim": 32,
  "max_action_dim": 32,
  "resize_imgs_with_padding": [
    512,
    512
  ],
  "tokenizer_max_length": 48,
  "attention_mode": "cross_attn",
  "pad_language_to": "max_length",
  "use_cache": true,
  "num_expert_layers": 0,
  "expert_width_multiplier": 0.75,
  "self_attn_every_n_layers": 2,
  "min_period": 0.004,
  "max_period": 4.0,
  "compile_model": false,
  "compile_mode": "max-autotune",
  "normalization_mapping": {
    "VISUAL": "IDENTITY",
    "STATE": "MEAN_STD",
    "ACTION": "MEAN_STD"
  },
  "input_features": {
    "observation.state": {
      "type": "STATE",
      "shape": [
        6
      ]
    },
    "observation.images.camera1": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera2": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera3": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    }
  },
  "output_features": {
    "action": {
      "type": "ACTION",
      "shape": [
        7
      ]
    }
  }
}

Optimizer

{
  "type": "adamw",
  "lr": 0.0001,
  "weight_decay": 1e-10,
  "grad_clip_norm": 10.0,
  "betas": [
    0.9,
    0.95
  ],
  "eps": 1e-08
}

Scheduler

{
  "type": "cosine_decay_with_warmup",
  "num_warmup_steps": 1000,
  "num_decay_steps": 30000,
  "peak_lr": 0.0001,
  "decay_lr": 2.5e-06
}

Logging

{
  "enable": true,
  "disable_artifact": false,
  "project": "lerobot-smolvla-ur7e",
  "entity": null,
  "notes": null,
  "run_id": "e1h98rll",
  "mode": null
}

Usage

Use this model as a LeRobot policy checkpoint:

python -m lerobot.scripts.lerobot_eval \
  --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep

For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.

Evaluation and Limitations

This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.

The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG training_state files are not included; only the loadable pretrained_model artifact is uploaded.

Provenance

Downloads last month
3
Safetensors
Model size
0.5B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep

Finetuned
(6269)
this model

Dataset used to train CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep