--- license: mit language: - en library_name: stable-baselines3 tags: - reinforcement-learning - mujoco - locomotion - robotics - curriculum-learning - dinosaurs - gymnasium model-index: - name: PPO-Velociraptor results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: MesozoicLabs/Raptor-v0 type: MesozoicLabs/Raptor-v0 metrics: - type: mean_reward value: 1366.19 +/- 76.29 name: mean_reward verified: false - type: success_rate value: 93.3% name: strike_success_rate verified: false --- # **PPO** Agents for Robotic Dinosaur Locomotion — **Mesozoic Labs** This repository contains **PPO** (Proximal Policy Optimization) agents trained to control robotic dinosaurs in MuJoCo physics simulation. Each species is trained using a 3-stage curriculum learning approach. - [GitHub Repository](https://github.com/kuds/mesozoic-labs) - [Documentation](https://mesozoiclabs.com) - [Blog: From Zero to Dino-Roar](https://www.findingtheta.com/blog/from-zero-to-dino-roar-teaching-a-t-rex-to-walk-with-mujoco-and-reinforcement-learning) ## Species & Training Results ### Velociraptor (PPO) — All 3 stages passed | 22M steps | 11:25:15 total A bipedal predator with sickle claws, trained on 3 curriculum stages: | Stage | Name | Best Reward | Avg Forward Vel | Success Rate | Time | |-------|------|-------------|-----------------|--------------|------| | 1 | Balance | 1964.43 +/- 27.39 | 0.11 m/s | — | 2:57:25 | | 2 | Locomotion | 2678.68 +/- 4.07 | 3.47 m/s | — | 4:35:55 | | 3 | Strike | 1366.19 +/- 76.29 | 2.02 m/s | 93.3% | 3:51:54 | ## Training Details - **Algorithm:** PPO (Proximal Policy Optimization) via [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) - **Physics Engine:** [MuJoCo](https://mujoco.org/) (>= 3.0) - **Environment Framework:** [Gymnasium](https://gymnasium.farama.org/) (>= 0.29) - **Hardware:** Google Colab L4 GPU - **Seed:** 42 - **Parallel Envs:** 4 - **Curriculum:** 3-stage progressive training (Balance → Locomotion → Species-specific task) ## Environment Details | Species | Observation Dims | Action Dims | Gymnasium ID | |---------|-----------------|-------------|--------------| | Velociraptor | 67 | 22 | `MesozoicLabs/Raptor-v0` | ## Usage ### Installation ```bash git clone https://github.com/kuds/mesozoic-labs.git cd mesozoic-labs python -m venv venv source venv/bin/activate # Install with training dependencies pip install -e ".[train]" ``` ### Loading a Trained Model ```python from stable_baselines3 import PPO import gymnasium as gym # Register Mesozoic Labs environments import environments # Load the trained model (e.g., velociraptor stage 3) model = PPO.load("path/to/best_model.zip") # Create the environment env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human") # Run the trained agent obs, info = env.reset() for _ in range(1000): action, _states = model.predict(obs, deterministic=True) obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: obs, info = env.reset() env.close() ``` ### Training from Scratch ```bash # Full 3-stage curriculum for velociraptor cd environments/velociraptor python scripts/train_sb3.py curriculum --algorithm ppo # Single stage training python scripts/train_sb3.py train --stage 1 --timesteps 6000000 --n-envs 4 ``` ### Loading from Hugging Face Hub ```bash pip install huggingface_hub ``` ```python from huggingface_hub import hf_hub_download from stable_baselines3 import PPO import gymnasium as gym import environments # Download the model from the Hub model_path = hf_hub_download( repo_id="kuds/mesozoic-labs", filename="results/velociraptor/ppo/best_model.zip" ) # Load the model model = PPO.load(model_path) # Create the environment env = gym.make("MesozoicLabs/Raptor-v0", render_mode="human") # Run the trained agent obs, info = env.reset() for _ in range(1000): action, _states = model.predict(obs, deterministic=True) obs, reward, terminated, truncated, info = env.step(action) if terminated or truncated: obs, info = env.reset() env.close() ``` ## Citation ```bibtex @software{mesozoic_labs, title = {Mesozoic Labs: Dinosaur Locomotion via Reinforcement Learning}, author = {Michael Kudlaty}, year = {2025}, url = {https://github.com/kuds/mesozoic-labs}, license = {MIT} } ``` ## License MIT License