---
license: mit
tags:
  - robotics
  - imitation-learning
  - act
  - action-chunking-transformer
  - lerobot
  - so101
datasets:
  - so101_safe_worker1
pipeline_tag: robotics
---

# ACT Policy for SO101 Robot Arm

An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks.

## Training Environment

![Training Environment](training_env.png)
*Left: Front camera view | Right: Wrist camera view (128x128 each)*

## Model Details

| Parameter | Value |
|-----------|-------|
| **Architecture** | ACT (Action Chunking Transformer) |
| **Vision Backbone** | ResNet50 (ImageNet V2 pretrained) |
| **Parameters** | 65M |
| **Chunk Size** | 40 |
| **N Action Steps** | 15 |
| **KL Weight** | 1.0 |
| **Training Steps** | 500,000 |
| **Batch Size** | 64 |
| **Learning Rate** | 3e-5 |
| **Backbone LR** | 1e-5 |

## Training Data

- **Dataset**: SO101 Safe Worker 1
- **Episodes**: 21,557
- **Total Frames**: 1.89M
- **Cameras**: Front + Wrist (128x128)
- **Action Space**: 4D
- **State Space**: 10D
- **FPS**: 10

## Usage

```python
from lerobot.policies.act.modeling_act import ACTPolicy

# Load the policy
policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k")

# Run inference
action = policy.select_action(observation)
```

### With LeRobot Evaluation

```python
from lerobot.scripts.eval import eval_policy

eval_policy(
    policy_path="gpudad/act-so101-chunk40-500k",
    env_name="so101_pick_cube",
    n_episodes=50,
)
```

## Training Configuration

```python
policy_cfg = ACTConfig(
    chunk_size=40,              # Predict 40 future actions
    n_action_steps=15,          # Execute 15 before re-planning
    kl_weight=1.0,              # Low KL for decisive actions
    vision_backbone="resnet50",
    pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2",
    optimizer_lr=3e-5,
    optimizer_lr_backbone=1e-5,
    use_amp=True,
)
```

## Performance Notes

- **Chunk size 40** covers most episode trajectories (episodes are ~90-120 steps)
- **N action steps 15** allows frequent re-planning for error correction
- **KL weight 1.0** produces more decisive, less hesitant actions
- **ResNet50** provides stronger visual features than ResNet18

## Framework

Trained using [LeRobot](https://github.com/huggingface/lerobot) v0.4.2 with [Roboport](https://github.com/Robo-Robotics/roboport).

## License

MIT