ACT Policy for SO101 Robot Arm

An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks.

Training Environment

Left: Front camera view | Right: Wrist camera view (128x128 each)

Model Details

Parameter	Value
Architecture	ACT (Action Chunking Transformer)
Vision Backbone	ResNet50 (ImageNet V2 pretrained)
Parameters	65M
Chunk Size	40
N Action Steps	15
KL Weight	1.0
Training Steps	500,000
Batch Size	64
Learning Rate	3e-5
Backbone LR	1e-5

Training Data

Dataset: SO101 Safe Worker 1
Episodes: 21,557
Total Frames: 1.89M
Cameras: Front + Wrist (128x128)
Action Space: 4D
State Space: 10D
FPS: 10

Usage

from lerobot.policies.act.modeling_act import ACTPolicy

# Load the policy
policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k")

# Run inference
action = policy.select_action(observation)

With LeRobot Evaluation

from lerobot.scripts.eval import eval_policy

eval_policy(
    policy_path="gpudad/act-so101-chunk40-500k",
    env_name="so101_pick_cube",
    n_episodes=50,
)

Training Configuration

policy_cfg = ACTConfig(
    chunk_size=40,              # Predict 40 future actions
    n_action_steps=15,          # Execute 15 before re-planning
    kl_weight=1.0,              # Low KL for decisive actions
    vision_backbone="resnet50",
    pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2",
    optimizer_lr=3e-5,
    optimizer_lr_backbone=1e-5,
    use_amp=True,
)

Performance Notes

Chunk size 40 covers most episode trajectories (episodes are ~90-120 steps)
N action steps 15 allows frequent re-planning for error correction
KL weight 1.0 produces more decisive, less hesitant actions
ResNet50 provides stronger visual features than ResNet18

Framework

Trained using LeRobot v0.4.2 with Roboport.

License

MIT

Downloads last month: 6

Video Preview

Robotics