ACT Policy: Right Hand Camera, Left Hand Pen Touch

中文说明

这是一个用于 AGIBOT O10 双臂任务的 ACT 策略模型。任务提示词为：

right hand picks up the camera, left hand picks up the pen, then left hand uses the pen to touch

模型来自本地训练任务 act_right_hand_camera_left_hand_pen_touch_20260427_rerun 的 checkpoints/last/pretrained_model，用于 arm-hand-teleop 推理配置：

infer:
  policy: act
  model_path: <this-model-repo-or-local-pretrained_model>

模型信息

策略类型：ACT
视觉骨干网络：ResNet-18，ImageNet 预训练权重
观测步数：1
动作 chunk size：50
动作步数：50
训练设备配置：CUDA
训练步数配置：90000
batch size：16
seed：1000

输入与输出

输入特征：

名称	类型	形状
`observation.state`	STATE	`[14]`
`observation.images.top`	VISUAL	`[3, 480, 640]`
`observation.images.left_wrist`	VISUAL	`[3, 480, 640]`
`observation.images.right_wrist`	VISUAL	`[3, 480, 640]`

输出特征：

名称	类型	形状
`action`	ACTION	`[14]`

数据集

训练配置中记录的数据集为本地 LeRobot 数据集：

local/agi_arm_camera_pen_touch_20260427
本地源路径：right_hand_picks_up_the_camera_left_hand_picks_up_the_pen_then_left_hand_uses_the_pen_to_touch_20260427_merged

同任务的完整合并数据集也已上传，可用于后续训练或评估：

Organization dataset: https://huggingface.co/datasets/FMC3-Robotic/agi_arm_bot_camera_pen_touch_merged_all
Personal dataset: https://huggingface.co/datasets/puheliang/agi_arm_bot_camera_pen_touch_merged_all

注意：这个模型 checkpoint 的训练配置指向 20260427 merged 数据集，不代表它已经用完整 merged_all 数据集重新训练。

文件

config.json：ACT policy 配置
model.safetensors：模型权重
policy_preprocessor.json：输入预处理配置
policy_preprocessor_step_3_normalizer_processor.safetensors：归一化状态
policy_postprocessor.json：输出后处理配置
policy_postprocessor_step_0_unnormalizer_processor.safetensors：反归一化状态
train_config.json：训练配置快照

使用方式

在 arm-hand-teleop 中，可以将推理配置的 infer.model_path 指向本仓库下载后的目录，或指向本地 pretrained_model 目录。对应配置文件：

/home/phl/workspace/arm-hand-teleop/configs/dual_arm/models/act_right_hand_camera_left_hand_pen_touch_20260427.yaml

关键运行约束：

infer.policy 必须为 act。
机器人配置需匹配训练 schema：双臂、include_eef_pose: false、tactile_mode: "none"。
相机键需要包含 top、left_wrist、right_wrist，图像分辨率为 640x480。
动作空间为 14 维，需匹配运行时的双臂关节/夹爪控制映射。

English

This is an ACT policy checkpoint for an AGIBOT O10 dual-arm task. The task prompt is:

right hand picks up the camera, left hand picks up the pen, then left hand uses the pen to touch

The checkpoint comes from the local training job act_right_hand_camera_left_hand_pen_touch_20260427_rerun, specifically checkpoints/last/pretrained_model. It is intended for the arm-hand-teleop inference config:

infer:
  policy: act
  model_path: <this-model-repo-or-local-pretrained_model>

Model Details

Policy type: ACT
Vision backbone: ResNet-18 with ImageNet pretrained weights
Observation steps: 1
Action chunk size: 50
Action steps: 50
Training device config: CUDA
Training steps config: 90000
Batch size: 16
Seed: 1000

Inputs and Outputs

Input features:

Name	Type	Shape
`observation.state`	STATE	`[14]`
`observation.images.top`	VISUAL	`[3, 480, 640]`
`observation.images.left_wrist`	VISUAL	`[3, 480, 640]`
`observation.images.right_wrist`	VISUAL	`[3, 480, 640]`

Output features:

Name	Type	Shape
`action`	ACTION	`[14]`

Dataset

The training config references a local LeRobot dataset:

local/agi_arm_camera_pen_touch_20260427
Local source path: right_hand_picks_up_the_camera_left_hand_picks_up_the_pen_then_left_hand_uses_the_pen_to_touch_20260427_merged

A larger merged dataset for the same task is also available for future training or evaluation:

Organization dataset: https://huggingface.co/datasets/FMC3-Robotic/agi_arm_bot_camera_pen_touch_merged_all
Personal dataset: https://huggingface.co/datasets/puheliang/agi_arm_bot_camera_pen_touch_merged_all

Note: this checkpoint's training config points to the 20260427 merged dataset. It does not indicate that this checkpoint was retrained on the full merged_all dataset.

Files

config.json: ACT policy configuration
model.safetensors: model weights
policy_preprocessor.json: input preprocessing pipeline
policy_preprocessor_step_3_normalizer_processor.safetensors: normalization state
policy_postprocessor.json: output postprocessing pipeline
policy_postprocessor_step_0_unnormalizer_processor.safetensors: unnormalization state
train_config.json: training configuration snapshot

Usage

In arm-hand-teleop, point infer.model_path to this downloaded model directory or to a local pretrained_model directory. The matching inference config is:

/home/phl/workspace/arm-hand-teleop/configs/dual_arm/models/act_right_hand_camera_left_hand_pen_touch_20260427.yaml

Runtime requirements:

infer.policy must be act.
The robot schema must match training: dual arm, include_eef_pose: false, tactile_mode: "none".
Camera keys must include top, left_wrist, and right_wrist, with 640x480 RGB images.
The action space is 14D and must match the runtime dual-arm joint/gripper control mapping.

Downloads last month: -

Safetensors

Model size

51.6M params

Tensor type

F32

Video Preview

Robotics