---
library_name: lerobot
base_model: lerobot/smolvla_base
datasets:
- CoRL2026-CSI/UR7e-CaP_arrange_block_100epi
tags:
- lerobot
- smolvla
- robotics
- ur7e
- code-as-policies
- imitation-learning
- CoRL2026
---

# SmolVLA UR7e Arrange Block 100epi (10 epochs)

This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.

## Model Details

- **Policy:** SmolVLA
- **Base checkpoint:** [`lerobot/smolvla_base`](https://huggingface.co/lerobot/smolvla_base)
- **Training dataset:** [`CoRL2026-CSI/UR7e-CaP_arrange_block_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP_arrange_block_100epi)
- **Training script:** `lerobot/scripts/train_smolvla_ur7e.sh`
- **Checkpoint:** step `5520`, approximately `10.00` epochs
- **Reported training loss at checkpoint:** `0.009`
- **Resolved config:** [`train_config.json`](train_config.json)

Related checkpoints from the same run:

- [5ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_5ep)
- [10ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep)

## Dataset

| Key | Value |
|---|---|
| `Robot` | UR7e |
| `Episodes` | 100 |
| `Frames` | 141,253 |
| `Tasks` | 1 |
| `FPS` | 30 |
| `Camera streams` | `observation.images.realsense_wrist`, `observation.images.realsense_topview` |
| `Dataset state/action shape` | [7] / [7] |

## Reproduction

The uploaded [`train_config.json`](train_config.json) is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.

| Key | Value |
|---|---|
| `script` | lerobot/scripts/train_smolvla_ur7e.sh |
| `job_name` | smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
| `output_dir` | /home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
| `seed` | 1000 |
| `launch` | single-process CUDA training via `python -m lerobot.scripts.lerobot_train` |
| `checkpoint_step` | 5520 |
| `checkpoint_epoch` | 10.00 |
| `checkpoint_train_loss` | 0.009 |
| `checkpoint_grad_norm` | 0.095 |
| `checkpoint_lr` | 2.5e-06 |
| `effective_batch` | 64 x 1 x 4 = 256 |

Approximate script invocation:

```bash
cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/UR7e-CaP_arrange_block_100epi" BATCH_SIZE="64" GRADIENT_ACCUMULATION_STEPS="4" STEPS="5520" NUM_WORKERS="4" DATALOADER_PREFETCH_FACTOR="1" CUDA_VISIBLE_DEVICES="0" NUM_GPUS="1" MIXED_PRECISION="bf16" SAVE_FREQ="2760" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla-ur7e" OMP_NUM_THREADS="4" MKL_NUM_THREADS="4" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" bash train_smolvla_ur7e.sh
```

## Detailed Hyperparameters

### Script Defaults and Environment

| Key | Value |
|---|---|
| `CONDA_ENV` | lerobot |
| `POLICY_TYPE` | smolvla |
| `POLICY_PATH` | lerobot/smolvla_base |
| `DATASET_REPO_ID` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
| `BATCH_SIZE` | 64 |
| `GRADIENT_ACCUMULATION_STEPS` | 4 |
| `STEPS` | 5520 |
| `NUM_WORKERS` | 4 |
| `DATALOADER_PREFETCH_FACTOR` | 1 |
| `CUDA_VISIBLE_DEVICES` | 0 |
| `NUM_GPUS` | 1 |
| `MIXED_PRECISION` | bf16 |
| `SAVE_FREQ` | 2760 |
| `LOG_FREQ` | 10 |
| `EVAL_FREQ` | 0 |
| `WANDB_PROJECT` | lerobot-smolvla-ur7e |
| `OMP_NUM_THREADS` | 4 |
| `MKL_NUM_THREADS` | 4 |
| `PYTORCH_CUDA_ALLOC_CONF` | expandable_segments:True |

### Training Loop and Dataloader

| Key | Value |
|---|---|
| `steps` | 5520 |
| `batch_size` | 64 |
| `gradient_accumulation_steps` | 4 |
| `num_workers` | 4 |
| `dataloader_prefetch_factor` | 1 |
| `dataloader_persistent_workers` | False |
| `dataloader_pin_memory` | True |
| `save_freq` | 2760 |
| `log_freq` | 10 |
| `eval_freq` | 0 |
| `cudnn_deterministic` | False |
| `use_policy_training_preset` | True |
| `ddp_find_unused_parameters` | True |
| `profile_timing` | False |

### Dataset Pipeline

| Key | Value |
|---|---|
| `dataset.repo_id` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
| `dataset.root` | `null` |
| `dataset.episodes` | `null` |
| `dataset.revision` | `null` |
| `dataset.use_imagenet_stats` | True |
| `dataset.video_backend` | torchcodec |
| `dataset.streaming` | False |

Image augmentation settings:

```json
{
  "enable": true,
  "max_num_transforms": 2,
  "random_order": true,
  "tfs": {
    "brightness": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "brightness": [
          0.8,
          1.2
        ]
      }
    },
    "contrast": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "contrast": [
          0.8,
          1.2
        ]
      }
    },
    "saturation": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "saturation": [
          0.5,
          1.5
        ]
      }
    },
    "hue": {
      "weight": 1.0,
      "type": "ColorJitter",
      "kwargs": {
        "hue": [
          -0.05,
          0.05
        ]
      }
    },
    "sharpness": {
      "weight": 1.0,
      "type": "SharpnessJitter",
      "kwargs": {
        "sharpness": [
          0.5,
          1.5
        ]
      }
    },
    "affine": {
      "weight": 1.0,
      "type": "RandomAffine",
      "kwargs": {
        "degrees": [
          -5.0,
          5.0
        ],
        "translate": [
          0.05,
          0.05
        ]
      }
    }
  }
}
```

Camera rename map:

```json
{
  "observation.images.realsense_wrist": "observation.images.camera1",
  "observation.images.realsense_topview": "observation.images.camera2"
}
```

### Policy Configuration

```json
{
  "type": "smolvla",
  "pretrained_path": "lerobot/smolvla_base",
  "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
  "load_vlm_weights": true,
  "num_vlm_layers": 16,
  "freeze_vision_encoder": true,
  "train_expert_only": true,
  "train_state_proj": true,
  "use_peft": false,
  "use_amp": false,
  "chunk_size": 50,
  "n_action_steps": 50,
  "num_steps": 10,
  "max_state_dim": 32,
  "max_action_dim": 32,
  "resize_imgs_with_padding": [
    512,
    512
  ],
  "tokenizer_max_length": 48,
  "attention_mode": "cross_attn",
  "pad_language_to": "max_length",
  "use_cache": true,
  "num_expert_layers": 0,
  "expert_width_multiplier": 0.75,
  "self_attn_every_n_layers": 2,
  "min_period": 0.004,
  "max_period": 4.0,
  "compile_model": false,
  "compile_mode": "max-autotune",
  "normalization_mapping": {
    "VISUAL": "IDENTITY",
    "STATE": "MEAN_STD",
    "ACTION": "MEAN_STD"
  },
  "input_features": {
    "observation.state": {
      "type": "STATE",
      "shape": [
        6
      ]
    },
    "observation.images.camera1": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera2": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    },
    "observation.images.camera3": {
      "type": "VISUAL",
      "shape": [
        3,
        256,
        256
      ]
    }
  },
  "output_features": {
    "action": {
      "type": "ACTION",
      "shape": [
        7
      ]
    }
  }
}
```

### Optimizer

```json
{
  "type": "adamw",
  "lr": 0.0001,
  "weight_decay": 1e-10,
  "grad_clip_norm": 10.0,
  "betas": [
    0.9,
    0.95
  ],
  "eps": 1e-08
}
```

### Scheduler

```json
{
  "type": "cosine_decay_with_warmup",
  "num_warmup_steps": 1000,
  "num_decay_steps": 30000,
  "peak_lr": 0.0001,
  "decay_lr": 2.5e-06
}
```

### Logging

```json
{
  "enable": true,
  "disable_artifact": false,
  "project": "lerobot-smolvla-ur7e",
  "entity": null,
  "notes": null,
  "run_id": "e1h98rll",
  "mode": null
}
```

## Usage

Use this model as a LeRobot policy checkpoint:

```bash
python -m lerobot.scripts.lerobot_eval \
  --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep
```

For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.

## Evaluation and Limitations

This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.

The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG `training_state` files are not included; only the loadable `pretrained_model` artifact is uploaded.

## Provenance

- VLM backbone: [`HuggingFaceTB/SmolVLM2-500M-Video-Instruct`](https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct)
- Fine-tuning run: `smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552`
- Source training script: `lerobot/scripts/train_smolvla_ur7e.sh`