GR00T-N1.7-3B-Pick-Orange
A fine-tuned version of nvidia/Cosmos-Reason2-2B for orange pick-and-place tasks in simulation, trained on the LightwheelAI/leisaac-pick-orange dataset.
Model Description
GR00T-N1.7 (Gr00tN1d7) is a vision-language-action (VLA) model for robot manipulation. This checkpoint is fine-tuned for a pick-and-place task where the robot picks up an orange in a simulated environment using a SO-ARM 101.
- Architecture: Gr00tN1d7 with Cosmos-Reason2-2B (Qwen-based) vision-language backbone + diffusion policy action head
- Base model: nvidia/Cosmos-Reason2-2B
- Task: Pick orange (simulation)
- Robot: SO-ARM 101
- Action horizon: 40 steps
- Inference timesteps: 4 (diffusion)
- Model dtype: bfloat16
Fine-tuning Configuration
| Parameter | Value |
|---|---|
| Tuned components | Diffusion model, projector, linear heads, VL-LN |
| Frozen components | Vision encoder, LLM backbone |
| Training steps | 6000 |
| Final training loss | ~0.030 |
| Action representation | Relative joints |
| Attention | Flash Attention 2 |
Training Details
- Dataset: LightwheelAI/leisaac-pick-orange
- Embodiment: SO-ARM 101 (
new_embodiment) - Max steps: 6000 (1 epoch)
- Final loss: 0.0301 at step 6000
Embodiment & Modalities
State inputs:
single_arm— arm joint positions (relative)gripper— gripper position (absolute)
Action outputs: same as state inputs
Cameras: front, wrist
Language conditioning: annotation.human.task_description
Usage
from gr00t.model.gr00t_n1 import GR00TPolicy
policy = GR00TPolicy.from_pretrained("hi-space/GR00T-N1.7-3B-Pick-Orange")
Refer to the NVIDIA Isaac GR00T repository for full inference and deployment instructions.
Intended Use
This model is intended for simulation-based robotic pick-and-place tasks involving oranges on a SO-ARM 101. It is not guaranteed to transfer zero-shot to real hardware without additional fine-tuning.
License
This model inherits the license from the base model nvidia/Cosmos-Reason2-2B. Please refer to NVIDIA's terms for usage restrictions.
- Downloads last month
- 61