How to use from the
Use from the
LeRobot library

ACT Policy for SO101 Robot Arm

An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks.

Training Environment

Training Environment Left: Front camera view | Right: Wrist camera view (128x128 each)

Model Details

Parameter Value
Architecture ACT (Action Chunking Transformer)
Vision Backbone ResNet50 (ImageNet V2 pretrained)
Parameters 65M
Chunk Size 40
N Action Steps 15
KL Weight 1.0
Training Steps 500,000
Batch Size 64
Learning Rate 3e-5
Backbone LR 1e-5

Training Data

  • Dataset: SO101 Safe Worker 1
  • Episodes: 21,557
  • Total Frames: 1.89M
  • Cameras: Front + Wrist (128x128)
  • Action Space: 4D
  • State Space: 10D
  • FPS: 10

Usage

from lerobot.policies.act.modeling_act import ACTPolicy

# Load the policy
policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k")

# Run inference
action = policy.select_action(observation)

With LeRobot Evaluation

from lerobot.scripts.eval import eval_policy

eval_policy(
    policy_path="gpudad/act-so101-chunk40-500k",
    env_name="so101_pick_cube",
    n_episodes=50,
)

Training Configuration

policy_cfg = ACTConfig(
    chunk_size=40,              # Predict 40 future actions
    n_action_steps=15,          # Execute 15 before re-planning
    kl_weight=1.0,              # Low KL for decisive actions
    vision_backbone="resnet50",
    pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2",
    optimizer_lr=3e-5,
    optimizer_lr_backbone=1e-5,
    use_amp=True,
)

Performance Notes

  • Chunk size 40 covers most episode trajectories (episodes are ~90-120 steps)
  • N action steps 15 allows frequent re-planning for error correction
  • KL weight 1.0 produces more decisive, less hesitant actions
  • ResNet50 provides stronger visual features than ResNet18

Framework

Trained using LeRobot v0.4.2 with Roboport.

License

MIT

Downloads last month
6
Video Preview
loading