---
license: apache-2.0
library_name: lerobot
pipeline_tag: robotics
tags:
  - robotics
  - imitation-learning
  - mujoco
  - lerobot
  - act
  - so101
---

# ACT SO101 PickCube (chunk40, 250k) — v1

This repository contains an **Action Chunking Transformer (ACT)** policy trained for the SO-101 MuJoCo simulation pick-and-place task:
**pick up the red cube and place it in the blue bin**.

## Task

- **Environment**: `SO101PickCube-v0` (`lerobot.envs.so101_sim`)
- **Randomization**: randomized cube position + randomized drop-zone/bin position

## Inputs / Outputs

- **Observations**
  - `observation.images.front`: RGB image (3×128×128)
  - `observation.images.wrist`: RGB image (3×128×128)
  - `observation.state`: 10D state vector (`agent_pos` from the env)
- **Action**: 4D action `[dx, dy, dz, gripper]`
  - `dx, dy, dz` are end-effector delta commands in \([-1, 1]\)
  - `gripper` is in \([0, 2]\) where `0=open`, `1=stay`, `2=close`

## Normalization

This policy was trained with:

- **Images**: ImageNet mean/std
- **State & Action**: MEAN_STD using the included stats files:
  - `policy_preprocessor_step_3_normalizer_processor.safetensors`
  - `policy_postprocessor_step_0_unnormalizer_processor.safetensors`

## Model

- **Policy**: ACT (Action Chunking Transformer)
- **Vision backbone**: ResNet-18 (`ResNet18_Weights.IMAGENET1K_V1`)
- **Chunking**: `chunk_size=40`, `n_action_steps=40`
- **Transformer**: `dim_model=512`, `n_heads=8`, `n_encoder_layers=4`, `n_decoder_layers=1`
- **VAE**: enabled (`latent_dim=32`, `kl_weight=10.0`)

## Training

Key settings (see `train_config.json` for full config):

- **Steps**: 250,000
- **Batch size**: 8
- **Optimizer**: AdamW (`lr=1e-5`, `weight_decay=1e-4`)
- **Dataset**: recorded locally as `local/so101_safe_worker1` during training (not published on the Hub)

## Usage

### Load

```python
from lerobot.policies.act.modeling_act import ACTPolicy

policy = ACTPolicy.from_pretrained("gpudad/act_so101_chunk40_250k_v1")
policy.eval()
```

### Evaluate in the SO101 simulator

If you’re using the evaluation scripts in this repo (like `eval_so101.py`), download the snapshot locally first so the script can read `config.json` and the normalization stats files:

```python
from huggingface_hub import snapshot_download

local_dir = snapshot_download("gpudad/act_so101_chunk40_250k_v1")
print(local_dir)
```

Then run:

```bash
python eval_so101.py --model <local_dir> --episodes 10 --max-steps 250 --no-viewer
```

Note: `SO101PickCube-v0` reports success as `info["succeed"]`.

## Limitations

- This policy is intended for the specific observation layout + environment settings used by `SO101PickCube-v0`.
- Performance can vary with MuJoCo version, rendering settings, and random seeds.

## Citation

If you use ACT, please cite:

```bibtex
@article{zhao2023learning,
  title   = {Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware},
  author  = {Zhao, Tony Z. and others},
  journal = {arXiv preprint arXiv:2304.13705},
  year    = {2023}
}
```