--- license: mit tags: - reinforcement-learning - mujoco - halfcheetah - sac - stable-baselines3 model-index: - name: Your_Model_Name_SAC_HalfCheetah-v4 results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: HalfCheetah-v4 type: HalfCheetah-v4 metrics: - type: mean_reward value: 9692.192 name: Avg reward - type: max_reward value: 9969.899 name: Max reward - type: min_reward value: 9408.777 name: Min reward --- # Reward Rush: HalfCheetah SAC This repository contains a Soft Actor-Critic (SAC) agent trained for the HalfCheetah-v4 environment. ## Model Architecture The SAC actor is a multi-layer perceptron with the following specifications: - **Input:** 17 state observations - **Output:** 6 continuous actions - **Architecture:** - Linear(17, 256) -> ReLU - Linear(256, 256) -> ReLU - Linear(256, 6) for `mean` + Linear(256, 6) for `log_std` - **Note:** The actor outputs mean and log standard deviation for each action. For inference, only the mean is used, passed through a tanh activation to bound actions to [-1, 1]. ## Common Mistakes to Avoid - **Layer Names:** The checkpoint uses `net`, `mean`, and `log_std`. Do not try to redefine layers with different names (`fc1`, `fc2`) unless you remap the keys. - **Output Dimensions:** Ensure the actor matches the checkpoint dimensions (6 actions). - **Continuous Actions:** HalfCheetah requires numpy arrays for actions. Flatten tensors and convert to numpy. - **Episode Evaluation:** Always test over full episodes (100 recommended) to properly evaluate performance. - **Checkpoint Loading:** Use `weights_only=True` when loading `.pth` state dicts for safety. ## Download and Test Code ```python import torch import torch.nn as nn import gymnasium as gym import numpy as np from huggingface_hub import hf_hub_download # Load stripped checkpoint ckpt = torch.load( hf_hub_download("Nharen/Reward_Rush_SAC_Half_Cheetah", "half_cheetah.pth"), weights_only=True ) obs_dim = ckpt["obs_dim"] act_dim = ckpt["act_dim"] hidden_dim = ckpt.get("hidden_dim", 256) # SAC Gaussian Actor class SACActor(nn.Module): def __init__(self, obs_dim, act_dim, hidden_dim=256): super().__init__() self.net = nn.Sequential( nn.Linear(obs_dim, hidden_dim), nn.ReLU(), nn.Linear(hidden_dim, hidden_dim), nn.ReLU() ) self.mean = nn.Linear(hidden_dim, act_dim) self.log_std = nn.Linear(hidden_dim, act_dim) def forward(self, obs): x = self.net(obs) mean = self.mean(x) return torch.tanh(mean) # Instantiate actor actor = SACActor(obs_dim, act_dim, hidden_dim) actor.load_state_dict(ckpt["actor_state_dict"]) actor.eval() # Environment env = gym.make("HalfCheetah-v4") num_episodes = 100 episode_rewards = [] # Run evaluation for ep in range(num_episodes): obs, _ = env.reset() done = False ep_reward = 0.0 while not done: with torch.no_grad(): obs_t = torch.tensor(obs, dtype=torch.float32).unsqueeze(0) action = actor(obs_t).squeeze(0).cpu().numpy() obs, reward, terminated, truncated, _ = env.step(action) ep_reward += reward done = terminated or truncated episode_rewards.append(ep_reward) print(f"Episode {ep+1:3d} | Reward: {ep_reward:.2f}") env.close() # Results episode_rewards = np.array(episode_rewards) print("\n===== Evaluation Summary =====") print(f"Episodes run: {num_episodes}") print(f"Mean reward: {episode_rewards.mean():.2f}") print(f"Std reward: {episode_rewards.std():.2f}") print(f"Min reward: {episode_rewards.min():.2f}") print(f"Max reward: {episode_rewards.max():.2f}") ```