---
license: mit
language:
- en
library_name: stable-baselines3
tags:
- reinforcement-learning
- mujoco
- locomotion
- robotics
- curriculum-learning
- dinosaurs
- gymnasium
model-index:
- name: PPO-Velociraptor
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: MesozoicLabs/Raptor-v0
      type: MesozoicLabs/Raptor-v0
    metrics:
    - type: mean_reward
      value: 1366.19 +/- 76.29
      name: mean_reward
      verified: false
    - type: success_rate
      value: 93.3%
      name: strike_success_rate
      verified: false
---

# **PPO** Agents for Robotic Dinosaur Locomotion — **Mesozoic Labs**

This repository contains **PPO** (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach.

- [GitHub Repository](https://github.com/kuds/mesozoic-labs)
- [Documentation](https://mesozoiclabs.com)
- [Blog: From Zero to Dino-Roar](https://www.findingtheta.com/blog/from-zero-to-dino-roar-teaching-a-t-rex-to-walk-with-mujoco-and-reinforcement-learning)

## Species & Training Results

### Velociraptor (PPO) — All 3 stages passed | 22M steps | 11:25:15 total

A bipedal predator with sickle claws, trained on 3 curriculum stages:

| Stage | Name | Best Reward | Avg Forward Vel | Success Rate | Time |
|-------|------|-------------|-----------------|--------------|------|
| 1 | Balance | 1964.43 +/- 27.39 | 0.11 m/s | — | 2:57:25 |
| 2 | Locomotion | 2678.68 +/- 4.07 | 3.47 m/s | — | 4:35:55 |
| 3 | Strike | 1366.19 +/- 76.29 | 2.02 m/s | 93.3% | 3:51:54 |

## Training Details

- **Algorithm:** PPO (Proximal Policy Optimization) via [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3)
- **Physics Engine:** [MuJoCo](https://mujoco.org/) (>= 3.0)
- **Environment Framework:** [Gymnasium](https://gymnasium.farama.org/) (>= 0.29)
- **Hardware:** Google Colab L4 GPU
- **Seed:** 42
- **Parallel Envs:** 4
- **Curriculum:** 3-stage progressive training (Balance → Locomotion → Species-specific task)

## Environment Details

| Species | Observation Dims | Action Dims | Gymnasium ID |
|---------|-----------------|-------------|--------------|
| Velociraptor | 67 | 22 | `MesozoicLabs/Raptor-v0` |

## Usage

### Installation

```bash
git clone https://github.com/kuds/mesozoic-labs.git
cd mesozoic-labs

python -m venv venv
source venv/bin/activate

# Install with training dependencies
pip install -e ".[train]"
```

### Loading a Trained Model

```python
from stable_baselines3 import PPO
import gymnasium as gym

# Register Mesozoic Labs environments
import environments

# Load the trained model (e.g., velociraptor stage 3)
model = PPO.load("path/to/best_model.zip")

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()
```

### Training from Scratch

```bash
# Full 3-stage curriculum for velociraptor
cd environments/velociraptor
python scripts/train_sb3.py curriculum --algorithm ppo

# Single stage training
python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4
```

### Loading from Hugging Face Hub

```bash
pip install huggingface_hub
```

```python
from huggingface_hub import hf_hub_download
from stable_baselines3 import PPO
import gymnasium as gym
import environments

# Download the model from the Hub
model_path = hf_hub_download(
    repo_id="kuds/mesozoic-labs",
    filename="results/velociraptor/ppo/best_model.zip"
)

# Load the model
model = PPO.load(model_path)

# Create the environment
env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human")

# Run the trained agent
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()
```

## Citation

```bibtex
@software{mesozoic_labs,
  title     = {Mesozoic Labs: Dinosaur Locomotion via Reinforcement Learning},
  author    = {Michael Kudlaty},
  year      = {2025},
  url       = {https://github.com/kuds/mesozoic-labs},
  license   = {MIT}
}
```

## License

MIT License