--- license: mit tags: - robotics - imitation-learning - act - action-chunking-transformer - lerobot - so101 datasets: - so101_safe_worker1 pipeline_tag: robotics --- # ACT Policy for SO101 Robot Arm An Action Chunking Transformer (ACT) policy trained for the SO101 robot arm manipulation tasks. ## Training Environment ![Training Environment](training_env.png) *Left: Front camera view | Right: Wrist camera view (128x128 each)* ## Model Details | Parameter | Value | |-----------|-------| | **Architecture** | ACT (Action Chunking Transformer) | | **Vision Backbone** | ResNet50 (ImageNet V2 pretrained) | | **Parameters** | 65M | | **Chunk Size** | 40 | | **N Action Steps** | 15 | | **KL Weight** | 1.0 | | **Training Steps** | 500,000 | | **Batch Size** | 64 | | **Learning Rate** | 3e-5 | | **Backbone LR** | 1e-5 | ## Training Data - **Dataset**: SO101 Safe Worker 1 - **Episodes**: 21,557 - **Total Frames**: 1.89M - **Cameras**: Front + Wrist (128x128) - **Action Space**: 4D - **State Space**: 10D - **FPS**: 10 ## Usage ```python from lerobot.policies.act.modeling_act import ACTPolicy # Load the policy policy = ACTPolicy.from_pretrained("gpudad/act-so101-chunk40-500k") # Run inference action = policy.select_action(observation) ``` ### With LeRobot Evaluation ```python from lerobot.scripts.eval import eval_policy eval_policy( policy_path="gpudad/act-so101-chunk40-500k", env_name="so101_pick_cube", n_episodes=50, ) ``` ## Training Configuration ```python policy_cfg = ACTConfig( chunk_size=40, # Predict 40 future actions n_action_steps=15, # Execute 15 before re-planning kl_weight=1.0, # Low KL for decisive actions vision_backbone="resnet50", pretrained_backbone_weights="ResNet50_Weights.IMAGENET1K_V2", optimizer_lr=3e-5, optimizer_lr_backbone=1e-5, use_amp=True, ) ``` ## Performance Notes - **Chunk size 40** covers most episode trajectories (episodes are ~90-120 steps) - **N action steps 15** allows frequent re-planning for error correction - **KL weight 1.0** produces more decisive, less hesitant actions - **ResNet50** provides stronger visual features than ResNet18 ## Framework Trained using [LeRobot](https://github.com/huggingface/lerobot) v0.4.2 with [Roboport](https://github.com/Robo-Robotics/roboport). ## License MIT