ACT Policy: Right Hand Camera, Left Hand Pen Touch

中文说明

这是一个用于 AGIBOT O10 双臂任务的 ACT 策略模型。任务提示词为:

right hand picks up the camera, left hand picks up the pen, then left hand uses the pen to touch

模型来自本地训练任务 act_right_hand_camera_left_hand_pen_touch_20260427_reruncheckpoints/last/pretrained_model,用于 arm-hand-teleop 推理配置:

infer:
  policy: act
  model_path: <this-model-repo-or-local-pretrained_model>

模型信息

  • 策略类型:ACT
  • 视觉骨干网络:ResNet-18,ImageNet 预训练权重
  • 观测步数:1
  • 动作 chunk size:50
  • 动作步数:50
  • 训练设备配置:CUDA
  • 训练步数配置:90000
  • batch size:16
  • seed:1000

输入与输出

输入特征:

名称 类型 形状
observation.state STATE [14]
observation.images.top VISUAL [3, 480, 640]
observation.images.left_wrist VISUAL [3, 480, 640]
observation.images.right_wrist VISUAL [3, 480, 640]

输出特征:

名称 类型 形状
action ACTION [14]

数据集

训练配置中记录的数据集为本地 LeRobot 数据集:

  • local/agi_arm_camera_pen_touch_20260427
  • 本地源路径:right_hand_picks_up_the_camera_left_hand_picks_up_the_pen_then_left_hand_uses_the_pen_to_touch_20260427_merged

同任务的完整合并数据集也已上传,可用于后续训练或评估:

注意:这个模型 checkpoint 的训练配置指向 20260427 merged 数据集,不代表它已经用完整 merged_all 数据集重新训练。

文件

  • config.json:ACT policy 配置
  • model.safetensors:模型权重
  • policy_preprocessor.json:输入预处理配置
  • policy_preprocessor_step_3_normalizer_processor.safetensors:归一化状态
  • policy_postprocessor.json:输出后处理配置
  • policy_postprocessor_step_0_unnormalizer_processor.safetensors:反归一化状态
  • train_config.json:训练配置快照

使用方式

arm-hand-teleop 中,可以将推理配置的 infer.model_path 指向本仓库下载后的目录,或指向本地 pretrained_model 目录。对应配置文件:

/home/phl/workspace/arm-hand-teleop/configs/dual_arm/models/act_right_hand_camera_left_hand_pen_touch_20260427.yaml

关键运行约束:

  • infer.policy 必须为 act
  • 机器人配置需匹配训练 schema:双臂、include_eef_pose: falsetactile_mode: "none"
  • 相机键需要包含 topleft_wristright_wrist,图像分辨率为 640x480。
  • 动作空间为 14 维,需匹配运行时的双臂关节/夹爪控制映射。

English

This is an ACT policy checkpoint for an AGIBOT O10 dual-arm task. The task prompt is:

right hand picks up the camera, left hand picks up the pen, then left hand uses the pen to touch

The checkpoint comes from the local training job act_right_hand_camera_left_hand_pen_touch_20260427_rerun, specifically checkpoints/last/pretrained_model. It is intended for the arm-hand-teleop inference config:

infer:
  policy: act
  model_path: <this-model-repo-or-local-pretrained_model>

Model Details

  • Policy type: ACT
  • Vision backbone: ResNet-18 with ImageNet pretrained weights
  • Observation steps: 1
  • Action chunk size: 50
  • Action steps: 50
  • Training device config: CUDA
  • Training steps config: 90000
  • Batch size: 16
  • Seed: 1000

Inputs and Outputs

Input features:

Name Type Shape
observation.state STATE [14]
observation.images.top VISUAL [3, 480, 640]
observation.images.left_wrist VISUAL [3, 480, 640]
observation.images.right_wrist VISUAL [3, 480, 640]

Output features:

Name Type Shape
action ACTION [14]

Dataset

The training config references a local LeRobot dataset:

  • local/agi_arm_camera_pen_touch_20260427
  • Local source path: right_hand_picks_up_the_camera_left_hand_picks_up_the_pen_then_left_hand_uses_the_pen_to_touch_20260427_merged

A larger merged dataset for the same task is also available for future training or evaluation:

Note: this checkpoint's training config points to the 20260427 merged dataset. It does not indicate that this checkpoint was retrained on the full merged_all dataset.

Files

  • config.json: ACT policy configuration
  • model.safetensors: model weights
  • policy_preprocessor.json: input preprocessing pipeline
  • policy_preprocessor_step_3_normalizer_processor.safetensors: normalization state
  • policy_postprocessor.json: output postprocessing pipeline
  • policy_postprocessor_step_0_unnormalizer_processor.safetensors: unnormalization state
  • train_config.json: training configuration snapshot

Usage

In arm-hand-teleop, point infer.model_path to this downloaded model directory or to a local pretrained_model directory. The matching inference config is:

/home/phl/workspace/arm-hand-teleop/configs/dual_arm/models/act_right_hand_camera_left_hand_pen_touch_20260427.yaml

Runtime requirements:

  • infer.policy must be act.
  • The robot schema must match training: dual arm, include_eef_pose: false, tactile_mode: "none".
  • Camera keys must include top, left_wrist, and right_wrist, with 640x480 RGB images.
  • The action space is 14D and must match the runtime dual-arm joint/gripper control mapping.
Downloads last month
-
Safetensors
Model size
51.6M params
Tensor type
F32
·
Video Preview
loading