---
license: mit
tags:
- reinforcement-learning
- robotics
- isaac-lab
- rtx-5090
- industrial-assembly
datasets:
- simulation
---

# Galaxea Gearbox Assembly R1 Policies

This repository contains the trained Reinforcement Learning (RL) policies for the high-precision gearbox assembly task using the Galaxea R1 robot. These models were trained using **NVIDIA Isaac Lab** on a single **NVIDIA RTX 5090**, achieving state-of-the-art simulation throughput and convergence stability.

## Model Description

The policies are trained to control a 7-DoF robotic arm (Galaxea R1) to assemble a complex planetary gearbox. The task is decomposed into sequential sub-tasks: `Approach` -> `Grasp` -> `Transport` (for each gear).

- **Algorithm**: PPO (Proximal Policy Optimization) via `rl_games`
- **Observation Space**: 69-dim (Joint pos/vel, EE pose, Relative gear targets)
- **Action Space**: 14-dim (Joint position targets + Gripper)
- **Training Framework**: Isaac Lab (DirectRL Mode)

## Performance Metrics

The models were trained with a massive throughput of **~8,200 FPS** (Frames Per Second) using full GPU vectorization.

| Policy | Stage | Avg Reward | Critic Loss | Entropy | Status |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **Approach** | 1 (Foundation) | ~241.4 | 3.8e-5 | 2.58 | **Converged** |
| **Grasp** | 2 (Manipulation) | ~240.9 | 3.3e-5 | -0.92 | **Converged** |
| **Transport 1** | 3 (Assembly) | ~282.6 | 1.7e-4 | 11.2 | **Robust** |

## Included Files

- `policy_approach.pth`: PyTorch checkpoint for the Approach phase.
- `policy_grasp.pth`: PyTorch checkpoint for the Grasping phase.
- `policy_transport_gear_1.pth`: PyTorch checkpoint for Transporting the first Sun Gear.
- `env_config.py`: The environment configuration used for training (PhysX settings, rewards).
- `agent_config.yaml`: The PPO hyperparameters.

## Usage

These policies are designed to be loaded into the Isaac Lab environment:

```python
# Pseudo-code for loading
from rl_games.torch_runner import Runner

runner = Runner()
runner.load('policy_approach.pth')
# ... run inference ...
```

## Hardware Specification

- **GPU**: NVIDIA GeForce RTX 5090 (32GB)
- **Training Time**: ~3 hours per policy (Optimized from 50+ days)
- **Simultaneous Envs**: 8,192