vpraise00 commited on
Commit
6f08a7d
·
verified ·
1 Parent(s): c5fe0f3

Expand model card with detailed training hyperparameters

Browse files
Files changed (1) hide show
  1. README.md +319 -39
README.md CHANGED
@@ -8,73 +8,353 @@ tags:
8
  - smolvla
9
  - robotics
10
  - ur7e
 
11
  - code-as-policies
12
  - imitation-learning
13
  - CoRL2026
14
  ---
15
 
16
- # SmolVLA UR7e Arrange Block 100epi (10 epochs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- This repository contains a SmolVLA policy fine-tuned for the UR7e arrange-block task using the LeRobot dataset [`CoRL2026-CSI/UR7e-CaP_arrange_block_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP_arrange_block_100epi).
19
 
20
- ## Model Details
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
- - **Policy:** SmolVLA
23
- - **Base checkpoint:** [`lerobot/smolvla_base`](https://huggingface.co/lerobot/smolvla_base)
24
- - **Training dataset:** [`CoRL2026-CSI/UR7e-CaP_arrange_block_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP_arrange_block_100epi)
25
- - **Robot:** UR7e
26
- - **Checkpoint:** step 5520, approximately 10 epochs
27
- - **Reported training loss at checkpoint:** 0.009
28
 
29
- ## Dataset
 
 
 
30
 
31
- The policy was trained on 100 episodes with 141,253 frames at 30 FPS. The dataset contains two RGB camera streams:
32
 
33
- - `observation.images.realsense_wrist`
34
- - `observation.images.realsense_topview`
35
 
36
- The action space is 7-dimensional: six UR7e joint positions plus gripper position.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
- ## Training Configuration
39
 
40
- - **Micro batch size:** 64
41
- - **Gradient accumulation:** 4
42
- - **Effective batch size:** 256
43
- - **Total run length:** 5,520 optimizer steps for the 10 epoch run
44
- - **Optimizer:** AdamW
45
- - **Peak learning rate:** 1e-4
46
- - **Final logged learning rate for this checkpoint:** 2.5e-06
47
- - **Image augmentation:** enabled, up to 2 transforms per frame
48
- - **Final logged gradient norm for this checkpoint:** 0.095
 
 
 
 
 
 
 
49
 
50
- Camera keys were remapped during training:
51
 
52
- ```json
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  {
54
  "observation.images.realsense_wrist": "observation.images.camera1",
55
  "observation.images.realsense_topview": "observation.images.camera2"
56
  }
57
  ```
58
 
59
- ## Usage
60
 
61
- Use this model as a LeRobot policy checkpoint:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
- ```bash
64
- python -m lerobot.scripts.lerobot_eval \
65
- --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ```
67
 
68
- For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
 
70
- ## Evaluation and Limitations
71
 
72
- This model card reports training checkpoint information only. No rollout success rate or real-robot evaluation metric is included in this repository.
73
 
74
- The checkpoint is intended for the UR7e arrange-block setup and assumes a compatible observation/action schema, including the camera remapping described above.
75
 
76
- ## Provenance
77
 
78
- - Dataset license: Apache-2.0, as declared by the dataset repository.
79
- - VLM backbone: [`HuggingFaceTB/SmolVLM2-500M-Video-Instruct`](https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct).
80
- - Fine-tuning run: `smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552`.
 
8
  - smolvla
9
  - robotics
10
  - ur7e
11
+ - ur7e
12
  - code-as-policies
13
  - imitation-learning
14
  - CoRL2026
15
  ---
16
 
17
+ # SmolVLA UR7e Arrange Block 100epi (10 epochs)
18
+
19
+ This repository contains a SmolVLA policy checkpoint fine-tuned with LeRobot. The model card is intentionally detailed so the training run can be reproduced or debugged from the uploaded artifact.
20
+
21
+ ## Model Details
22
+
23
+ - **Policy:** SmolVLA
24
+ - **Base checkpoint:** [`lerobot/smolvla_base`](https://huggingface.co/lerobot/smolvla_base)
25
+ - **Training dataset:** [`CoRL2026-CSI/UR7e-CaP_arrange_block_100epi`](https://huggingface.co/datasets/CoRL2026-CSI/UR7e-CaP_arrange_block_100epi)
26
+ - **Training script:** `lerobot/scripts/train_smolvla_ur7e.sh`
27
+ - **Checkpoint:** step `5520`, approximately `10.00` epochs
28
+ - **Reported training loss at checkpoint:** `0.009`
29
+ - **Resolved config:** [`train_config.json`](train_config.json)
30
+
31
+ Related checkpoints from the same run:
32
+
33
+ - [5ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_5ep)
34
+ - [10ep checkpoint](https://huggingface.co/CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep)
35
+
36
+ ## Dataset
37
+
38
+ | Key | Value |
39
+ |---|---|
40
+ | `Robot` | UR7e |
41
+ | `Episodes` | 100 |
42
+ | `Frames` | 141,253 |
43
+ | `Tasks` | 1 |
44
+ | `FPS` | 30 |
45
+ | `Camera streams` | `observation.images.realsense_wrist`, `observation.images.realsense_topview` |
46
+ | `Dataset state/action shape` | [7] / [7] |
47
+
48
+ ## Reproduction
49
 
50
+ The uploaded [`train_config.json`](train_config.json) is the authoritative serialized LeRobot config for this checkpoint. The table below mirrors the key values for quick inspection.
51
 
52
+ | Key | Value |
53
+ |---|---|
54
+ | `script` | lerobot/scripts/train_smolvla_ur7e.sh |
55
+ | `job_name` | smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
56
+ | `output_dir` | /home/work/hscho/corl_2026/AutoDataCollector/lerobot/outputs/train/smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552 |
57
+ | `seed` | 1000 |
58
+ | `launch` | single-process CUDA training via `python -m lerobot.scripts.lerobot_train` |
59
+ | `checkpoint_step` | 5520 |
60
+ | `checkpoint_epoch` | 10.00 |
61
+ | `checkpoint_train_loss` | 0.009 |
62
+ | `checkpoint_grad_norm` | 0.095 |
63
+ | `checkpoint_lr` | 2.5e-06 |
64
+ | `effective_batch` | 64 x 1 x 4 = 256 |
65
 
66
+ Approximate script invocation:
 
 
 
 
 
67
 
68
+ ```bash
69
+ cd /home/work/hscho/corl_2026/AutoDataCollector/lerobot
70
+ CONDA_ENV="lerobot" POLICY_TYPE="smolvla" POLICY_PATH="lerobot/smolvla_base" DATASET_REPO_ID="CoRL2026-CSI/UR7e-CaP_arrange_block_100epi" BATCH_SIZE="64" GRADIENT_ACCUMULATION_STEPS="4" STEPS="5520" NUM_WORKERS="4" DATALOADER_PREFETCH_FACTOR="1" CUDA_VISIBLE_DEVICES="0" NUM_GPUS="1" MIXED_PRECISION="bf16" SAVE_FREQ="2760" LOG_FREQ="10" EVAL_FREQ="0" WANDB_PROJECT="lerobot-smolvla-ur7e" OMP_NUM_THREADS="4" MKL_NUM_THREADS="4" PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" bash train_smolvla_ur7e.sh
71
+ ```
72
 
73
+ ## Detailed Hyperparameters
74
 
75
+ ### Script Defaults and Environment
 
76
 
77
+ | Key | Value |
78
+ |---|---|
79
+ | `CONDA_ENV` | lerobot |
80
+ | `POLICY_TYPE` | smolvla |
81
+ | `POLICY_PATH` | lerobot/smolvla_base |
82
+ | `DATASET_REPO_ID` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
83
+ | `BATCH_SIZE` | 64 |
84
+ | `GRADIENT_ACCUMULATION_STEPS` | 4 |
85
+ | `STEPS` | 5520 |
86
+ | `NUM_WORKERS` | 4 |
87
+ | `DATALOADER_PREFETCH_FACTOR` | 1 |
88
+ | `CUDA_VISIBLE_DEVICES` | 0 |
89
+ | `NUM_GPUS` | 1 |
90
+ | `MIXED_PRECISION` | bf16 |
91
+ | `SAVE_FREQ` | 2760 |
92
+ | `LOG_FREQ` | 10 |
93
+ | `EVAL_FREQ` | 0 |
94
+ | `WANDB_PROJECT` | lerobot-smolvla-ur7e |
95
+ | `OMP_NUM_THREADS` | 4 |
96
+ | `MKL_NUM_THREADS` | 4 |
97
+ | `PYTORCH_CUDA_ALLOC_CONF` | expandable_segments:True |
98
 
99
+ ### Training Loop and Dataloader
100
 
101
+ | Key | Value |
102
+ |---|---|
103
+ | `steps` | 5520 |
104
+ | `batch_size` | 64 |
105
+ | `gradient_accumulation_steps` | 4 |
106
+ | `num_workers` | 4 |
107
+ | `dataloader_prefetch_factor` | 1 |
108
+ | `dataloader_persistent_workers` | False |
109
+ | `dataloader_pin_memory` | True |
110
+ | `save_freq` | 2760 |
111
+ | `log_freq` | 10 |
112
+ | `eval_freq` | 0 |
113
+ | `cudnn_deterministic` | False |
114
+ | `use_policy_training_preset` | True |
115
+ | `ddp_find_unused_parameters` | True |
116
+ | `profile_timing` | False |
117
 
118
+ ### Dataset Pipeline
119
 
120
+ | Key | Value |
121
+ |---|---|
122
+ | `dataset.repo_id` | CoRL2026-CSI/UR7e-CaP_arrange_block_100epi |
123
+ | `dataset.root` | `null` |
124
+ | `dataset.episodes` | `null` |
125
+ | `dataset.revision` | `null` |
126
+ | `dataset.use_imagenet_stats` | True |
127
+ | `dataset.video_backend` | torchcodec |
128
+ | `dataset.streaming` | False |
129
+
130
+ Image augmentation settings:
131
+
132
+ ```json
133
+ {
134
+ "enable": true,
135
+ "max_num_transforms": 2,
136
+ "random_order": true,
137
+ "tfs": {
138
+ "brightness": {
139
+ "weight": 1.0,
140
+ "type": "ColorJitter",
141
+ "kwargs": {
142
+ "brightness": [
143
+ 0.8,
144
+ 1.2
145
+ ]
146
+ }
147
+ },
148
+ "contrast": {
149
+ "weight": 1.0,
150
+ "type": "ColorJitter",
151
+ "kwargs": {
152
+ "contrast": [
153
+ 0.8,
154
+ 1.2
155
+ ]
156
+ }
157
+ },
158
+ "saturation": {
159
+ "weight": 1.0,
160
+ "type": "ColorJitter",
161
+ "kwargs": {
162
+ "saturation": [
163
+ 0.5,
164
+ 1.5
165
+ ]
166
+ }
167
+ },
168
+ "hue": {
169
+ "weight": 1.0,
170
+ "type": "ColorJitter",
171
+ "kwargs": {
172
+ "hue": [
173
+ -0.05,
174
+ 0.05
175
+ ]
176
+ }
177
+ },
178
+ "sharpness": {
179
+ "weight": 1.0,
180
+ "type": "SharpnessJitter",
181
+ "kwargs": {
182
+ "sharpness": [
183
+ 0.5,
184
+ 1.5
185
+ ]
186
+ }
187
+ },
188
+ "affine": {
189
+ "weight": 1.0,
190
+ "type": "RandomAffine",
191
+ "kwargs": {
192
+ "degrees": [
193
+ -5.0,
194
+ 5.0
195
+ ],
196
+ "translate": [
197
+ 0.05,
198
+ 0.05
199
+ ]
200
+ }
201
+ }
202
+ }
203
+ }
204
+ ```
205
+
206
+ Camera rename map:
207
+
208
+ ```json
209
  {
210
  "observation.images.realsense_wrist": "observation.images.camera1",
211
  "observation.images.realsense_topview": "observation.images.camera2"
212
  }
213
  ```
214
 
215
+ ### Policy Configuration
216
 
217
+ ```json
218
+ {
219
+ "type": "smolvla",
220
+ "pretrained_path": "lerobot/smolvla_base",
221
+ "vlm_model_name": "HuggingFaceTB/SmolVLM2-500M-Video-Instruct",
222
+ "load_vlm_weights": true,
223
+ "num_vlm_layers": 16,
224
+ "freeze_vision_encoder": true,
225
+ "train_expert_only": true,
226
+ "train_state_proj": true,
227
+ "use_peft": false,
228
+ "use_amp": false,
229
+ "chunk_size": 50,
230
+ "n_action_steps": 50,
231
+ "num_steps": 10,
232
+ "max_state_dim": 32,
233
+ "max_action_dim": 32,
234
+ "resize_imgs_with_padding": [
235
+ 512,
236
+ 512
237
+ ],
238
+ "tokenizer_max_length": 48,
239
+ "attention_mode": "cross_attn",
240
+ "pad_language_to": "max_length",
241
+ "use_cache": true,
242
+ "num_expert_layers": 0,
243
+ "expert_width_multiplier": 0.75,
244
+ "self_attn_every_n_layers": 2,
245
+ "min_period": 0.004,
246
+ "max_period": 4.0,
247
+ "compile_model": false,
248
+ "compile_mode": "max-autotune",
249
+ "normalization_mapping": {
250
+ "VISUAL": "IDENTITY",
251
+ "STATE": "MEAN_STD",
252
+ "ACTION": "MEAN_STD"
253
+ },
254
+ "input_features": {
255
+ "observation.state": {
256
+ "type": "STATE",
257
+ "shape": [
258
+ 6
259
+ ]
260
+ },
261
+ "observation.images.camera1": {
262
+ "type": "VISUAL",
263
+ "shape": [
264
+ 3,
265
+ 256,
266
+ 256
267
+ ]
268
+ },
269
+ "observation.images.camera2": {
270
+ "type": "VISUAL",
271
+ "shape": [
272
+ 3,
273
+ 256,
274
+ 256
275
+ ]
276
+ },
277
+ "observation.images.camera3": {
278
+ "type": "VISUAL",
279
+ "shape": [
280
+ 3,
281
+ 256,
282
+ 256
283
+ ]
284
+ }
285
+ },
286
+ "output_features": {
287
+ "action": {
288
+ "type": "ACTION",
289
+ "shape": [
290
+ 7
291
+ ]
292
+ }
293
+ }
294
+ }
295
+ ```
296
+
297
+ ### Optimizer
298
 
299
+ ```json
300
+ {
301
+ "type": "adamw",
302
+ "lr": 0.0001,
303
+ "weight_decay": 1e-10,
304
+ "grad_clip_norm": 10.0,
305
+ "betas": [
306
+ 0.9,
307
+ 0.95
308
+ ],
309
+ "eps": 1e-08
310
+ }
311
+ ```
312
+
313
+ ### Scheduler
314
+
315
+ ```json
316
+ {
317
+ "type": "cosine_decay_with_warmup",
318
+ "num_warmup_steps": 1000,
319
+ "num_decay_steps": 30000,
320
+ "peak_lr": 0.0001,
321
+ "decay_lr": 2.5e-06
322
+ }
323
  ```
324
 
325
+ ### Logging
326
+
327
+ ```json
328
+ {
329
+ "enable": true,
330
+ "disable_artifact": false,
331
+ "project": "lerobot-smolvla-ur7e",
332
+ "entity": null,
333
+ "notes": null,
334
+ "run_id": "e1h98rll",
335
+ "mode": null
336
+ }
337
+ ```
338
+
339
+ ## Usage
340
+
341
+ Use this model as a LeRobot policy checkpoint:
342
+
343
+ ```bash
344
+ python -m lerobot.scripts.lerobot_eval \
345
+ --policy.path=CoRL2026-CSI/smolvla_ur7e_arrange_block_100epi_10ep
346
+ ```
347
+
348
+ For Python loading inside LeRobot code, use the SmolVLA policy loader with this repository id as the pretrained path.
349
 
350
+ ## Evaluation and Limitations
351
 
352
+ This model card reports training checkpoint information only. No rollout success rate or task-level evaluation metric is included in this repository.
353
 
354
+ The checkpoint assumes a compatible observation/action schema and the camera remapping shown above. The optimizer/RNG `training_state` files are not included; only the loadable `pretrained_model` artifact is uploaded.
355
 
356
+ ## Provenance
357
 
358
+ - VLM backbone: [`HuggingFaceTB/SmolVLM2-500M-Video-Instruct`](https://huggingface.co/HuggingFaceTB/SmolVLM2-500M-Video-Instruct)
359
+ - Fine-tuning run: `smolvla_ur7e_arrange_block_100epi_bs64_acc4_ep10_20260509_130552`
360
+ - Source training script: `lerobot/scripts/train_smolvla_ur7e.sh`