---
license: mit
tags:
- reinforcement-learning
- mujoco
- halfcheetah
- sac
- stable-baselines3
model-index:
- name: Your_Model_Name_SAC_HalfCheetah-v4
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: HalfCheetah-v4
      type: HalfCheetah-v4
    metrics:
    - type: mean_reward
      value: 9692.192
      name: Avg reward
    - type: max_reward
      value: 9969.899
      name: Max reward
    - type: min_reward
      value: 9408.777
      name: Min reward
---

# Reward Rush: HalfCheetah SAC

This repository contains a Soft Actor-Critic (SAC) agent trained for the HalfCheetah-v4 environment.

## Model Architecture

The SAC actor is a multi-layer perceptron with the following specifications:

- **Input:** 17 state observations  
- **Output:** 6 continuous actions  
- **Architecture:**
  - Linear(17, 256) -> ReLU
  - Linear(256, 256) -> ReLU
  - Linear(256, 6) for `mean` + Linear(256, 6) for `log_std`  
- **Note:** The actor outputs mean and log standard deviation for each action. For inference, only the mean is used, passed through a tanh activation to bound actions to [-1, 1].

## Common Mistakes to Avoid

- **Layer Names:** The checkpoint uses `net`, `mean`, and `log_std`. Do not try to redefine layers with different names (`fc1`, `fc2`) unless you remap the keys.  
- **Output Dimensions:** Ensure the actor matches the checkpoint dimensions (6 actions).  
- **Continuous Actions:** HalfCheetah requires numpy arrays for actions. Flatten tensors and convert to numpy.  
- **Episode Evaluation:** Always test over full episodes (100 recommended) to properly evaluate performance.  
- **Checkpoint Loading:** Use `weights_only=True` when loading `.pth` state dicts for safety.

## Download and Test Code

```python
import torch
import torch.nn as nn
import gymnasium as gym
import numpy as np
from huggingface_hub import hf_hub_download

# Load stripped checkpoint
ckpt = torch.load(
    hf_hub_download("Nharen/Reward_Rush_SAC_Half_Cheetah", "half_cheetah.pth"),
    weights_only=True
)

obs_dim = ckpt["obs_dim"]
act_dim = ckpt["act_dim"]
hidden_dim = ckpt.get("hidden_dim", 256)

# SAC Gaussian Actor
class SACActor(nn.Module):
    def __init__(self, obs_dim, act_dim, hidden_dim=256):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU()
        )
        self.mean = nn.Linear(hidden_dim, act_dim)
        self.log_std = nn.Linear(hidden_dim, act_dim)

    def forward(self, obs):
        x = self.net(obs)
        mean = self.mean(x)
        return torch.tanh(mean)

# Instantiate actor
actor = SACActor(obs_dim, act_dim, hidden_dim)
actor.load_state_dict(ckpt["actor_state_dict"])
actor.eval()

# Environment
env = gym.make("HalfCheetah-v4")
num_episodes = 100
episode_rewards = []

# Run evaluation
for ep in range(num_episodes):
    obs, _ = env.reset()
    done = False
    ep_reward = 0.0

    while not done:
        with torch.no_grad():
            obs_t = torch.tensor(obs, dtype=torch.float32).unsqueeze(0)
            action = actor(obs_t).squeeze(0).cpu().numpy()
        obs, reward, terminated, truncated, _ = env.step(action)
        ep_reward += reward
        done = terminated or truncated

    episode_rewards.append(ep_reward)
    print(f"Episode {ep+1:3d} | Reward: {ep_reward:.2f}")

env.close()

# Results
episode_rewards = np.array(episode_rewards)
print("\n===== Evaluation Summary =====")
print(f"Episodes run: {num_episodes}")
print(f"Mean reward: {episode_rewards.mean():.2f}")
print(f"Std reward:  {episode_rewards.std():.2f}")
print(f"Min reward:  {episode_rewards.min():.2f}")
print(f"Max reward:  {episode_rewards.max():.2f}")
```