--- license: mit tags: - reinforcement-learning - sac - ant-v5 - mujoco - gymnasium - spinning-up - openai - monigarr library_name: pytorch datasets: [] metrics: - AverageEpRet - StdEpRet - MaxEpRet - MinEpRet - AverageTestEpRet model-index: - name: SAC Agent on Ant-v5 (Modernized Spinning Up) results: [] metadata: training: framework: spinning-up engine: mujoco environment: Ant-v5 epochs: 250 trained_by: MoniGarr intended_use: purpose: RL control benchmark, educational research, and reproducibility usage: Load agent and run in gymnasium Ant-v5 for analysis or baseline limitations: - Trained on a fixed seed and MuJoCo physics version - Not intended for real-world robotics out-of-the-box license: mit --- # 🤖 Soft Actor-Critic (SAC) on Ant-v5 — Modernized OpenAI Spinning Up This repository presents a fully trained **Soft Actor-Critic (SAC)** agent on the `Ant-v5` environment using a **modernized PyTorch-based version of OpenAI's Spinning Up in Deep RL** educational framework. > Developed, trained, and maintained by [MoniGarr](https://huggingface.co/MoniGarr) — a self-directed AI researcher focused on NLP, multimodal systems, and RL control frameworks. --- ## Project Mission This work contributes to the revitalization of OpenAI’s highly respected [Spinning Up in Deep RL](https://spinningup.openai.com/) codebase. The original repo no longer supported Python 3.8+, latest MuJoCo, or `gymnasium`. This project patches those limitations and showcases a reproducible, high-performing SAC agent for the modern `Ant-v5` benchmark. It also supports my broader mission: to demonstrate technical excellence and creativity in deep reinforcement learning and AI research while advancing open and inclusive access to intelligent systems. Some of my online students and clients use my demos for learning purposes. --- ## Model Details | Attribute | Value | |------------------|---------------------------------------| | Algorithm | Soft Actor-Critic (SAC) | | Framework | PyTorch (Modernized Spinning Up) | | Environment | `Ant-v5` via `gymnasium[mujoco]` | | Epochs | 250 | | Action Space | Continuous (Box) | | Observation Space | Continuous (Box) | | Command Used | `python -m spinup.run sac --env Ant-v5 --epochs 250 --exp_name experiment_sac_antv5_july_20_2025` | --- ## Training Metrics Summary | Metric | Description | |--------------------|---------------------------------------------| | `AverageEpRet` | Average return per episode (training) | | `StdEpRet` | Std deviation of return | | `MaxEpRet` | Max episode return in this run | | `MinEpRet` | Min episode return in this run | | `AverageTestEpRet` | Average return on test episodes | > Full logs: `https://github.com/monigarr/spinningup/tree/monigarr-dev/data/experiment_sac_antv5_july_20_2025/progress.txt` --- ## 🔍 Research Observations - Policy performance stabilized after ~200 epochs - Reward-to-noise ratio improved with tuned entropy coefficient (α = 0.2) - Robust gait developed for complex terrain and perturbations --- ## 🧪 Research Context This experiment is part of a broader initiative to: - Modernize and benchmark deep RL frameworks - Create reproducible SAC baselines for MuJoCo control tasks - Prepare high-quality artifacts for hybrid/remote AI research roles (RL, multimodal AI, language models) I am currently pursuing research roles, residencies and collaborations with a focus on intelligent control systems and language-grounded agents. I bring 30+ years of technical experience/ (previous lead mobile software architect / engineer / dev, XR producer, 3D Technical Artist), speak Kanien’kéha dialects (Mohawk Language), and a long-standing record of building ethical, useful, and inclusive AI. --- ## 🚀 Quickstart — Run the Model ```bash # Install required libraries pip install torch gymnasium[mujoco] # Clone this repo (or download model + config) git clone https://huggingface.co/MoniGarr/sac-antv5-modernized cd sac-antv5-modernized # Launch the SAC agent (interactive render) python run_agent.py --env Ant-v5 --model_path ./pyt_save/model.pt Author & Contact MoniGarr - AI Researcher — NLP · RL · Multimodal AI - Based in Akwesasne / Massena, New York - monigarr@monigarr.com | github.com/monigarr I’m looking to collaborate with ethical AI teams, remote research labs, and mission-driven builders of intelligent systems.