GR00T-N1.7-3B-Pick-Orange

A fine-tuned version of nvidia/Cosmos-Reason2-2B for orange pick-and-place tasks in simulation, trained on the LightwheelAI/leisaac-pick-orange dataset.

Model Description

GR00T-N1.7 (Gr00tN1d7) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up an orange in a simulated environment using a SO-ARM 101.

Architecture: Gr00tN1d7 with Cosmos-Reason2-2B (Qwen-based) vision-language backbone + diffusion policy action head
Base model: nvidia/Cosmos-Reason2-2B
Task: Pick orange (simulation)
Robot: SO-ARM 101
Action horizon: 40 steps
Inference timesteps: 4 (diffusion)
Model dtype: bfloat16

Fine-tuning Configuration

Parameter	Value
Tuned components	Diffusion model, projector, linear heads, VL-LN
Frozen components	Vision encoder, LLM backbone
Training steps	6000
Final training loss	~0.030
Action representation	Relative joints
Attention	Flash Attention 2

Training Details

Dataset: LightwheelAI/leisaac-pick-orange
Embodiment: SO-ARM 101 (new_embodiment)
Max steps: 6000 (1 epoch)
Final loss: 0.0301 at step 6000

Embodiment & Modalities

State inputs:

single_arm — arm joint positions (relative)
gripper — gripper position (absolute)

Action outputs: same as state inputs

Cameras: front, wrist

Language conditioning: annotation.human.task_description

Usage

from gr00t.model.gr00t_n1 import GR00TPolicy

policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.7-3B-Pick-Orange")

Refer to the NVIDIA Isaac GR00T repository for full inference and deployment instructions.

Intended Use

This model is intended for simulation-based robotic pick-and-place tasks involving oranges on a SO-ARM 101. It is not guaranteed to transfer zero-shot to real hardware without additional fine-tuning.

License

This model inherits the license from the base model nvidia/Cosmos-Reason2-2B. Please refer to NVIDIA's terms for usage restrictions.

Downloads last month: 61

Safetensors

Model size

3B params

Tensor type

F32

Video Preview

Robotics

Model tree for hi-space/GR00T-N1.7-3B-Pick-Orange

Base model

Qwen/Qwen3-VL-2B-Instruct

Finetuned

nvidia/Cosmos-Reason2-2B