Spaces:

Prajwal782007
/

Gridmind

Running

App Files Files Community

adityss commited on Apr 1

Commit

1875b13

0 Parent(s):

Initial commit

Browse files

Files changed (20) hide show

Dockerfile +59 -0
README.md +110 -0
dashboard/server.py +66 -0
dashboard/static/dashboard.js +617 -0
dashboard/static/index.html +627 -0
data/generate_prices.py +60 -0
data/price_curves.json +997 -0
env/environment.go +679 -0
env/models.go +170 -0
env/rewards.go +143 -0
env/tasks.go +203 -0
go.mod +3 -0
main.go +376 -0
openenv.yaml +137 -0
python/inference.py +357 -0
python/models.py +221 -0
python/requirements.txt +7 -0
python/validate.py +258 -0
tests/environment_test.go +191 -0
tests/test_graders.py +173 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,59 @@

+# Stage 1: Build the Go environment server
+FROM golang:1.21-alpine AS builder
+WORKDIR /app
+COPY go.mod go.sum* ./
+RUN go mod download || true
+COPY main.go ./
+COPY env/ ./env/
+RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o gridmind-server main.go
+# Stage 2: Final image with Python runtime and Dashboard
+FROM python:3.11-slim
+WORKDIR /app
+# Install supervisor to run both servers
+RUN apt-get update && apt-get install -y supervisor && rm -rf /var/lib/apt/lists/*
+# Install Python dependencies
+COPY python/requirements.txt ./requirements.txt
+RUN pip install --no-cache-dir -r requirements.txt || \
+    pip install --no-cache-dir openai fastapi uvicorn httpx pydantic "pydantic>=2.0.0" requests
+# Copy Go binary
+COPY --from=builder /app/gridmind-server /usr/local/bin/gridmind-server
+# Copy Python layer and Dashboard
+COPY python/ ./python/
+COPY dashboard/ ./dashboard/
+COPY data/ ./data/
+COPY openenv.yaml ./
+# Configure Supervisor
+RUN echo "[supervisord]" > /etc/supervisor/conf.d/supervisord.conf && \
+    echo "nodaemon=true" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "[program:go-env]" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "command=/usr/local/bin/gridmind-server" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "environment=PORT=7860" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stdout_logfile=/dev/stdout" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stdout_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stderr_logfile=/dev/stderr" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stderr_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "[program:dashboard]" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "command=python -m uvicorn dashboard.server:app --host 0.0.0.0 --port 7861" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stdout_logfile=/dev/stdout" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stdout_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stderr_logfile=/dev/stderr" >> /etc/supervisor/conf.d/supervisord.conf && \
+    echo "stderr_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf
+# Add a non-root user (good practice and required for some HF Spaces configs)
+RUN useradd -m -u 1000 user
+RUN chown -R user:user /app
+USER user
+# 7860 = Env Server (main OpenEnv endpoint), 7861 = Dashboard
+EXPOSE 7860 7861
+CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# GridMind-RL
+GridMind-RL is an OpenEnv-compliant reinforcement learning environment simulating a commercial/industrial building energy management system.
+An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
+## Architecture
+```text
+ ┌──────────────────────┐        ┌─────────────────────────────┐
+ │                      │        │                             │
+ │    LLM RL Agent      │◄───────┤    GridMind-RL Server       │
+ │   (Inference Script) │ POST   │    (Go OpenEnv Backend)     │
+ │                      ├───────►│  Port 7860                  │
+ └──────────────────────┘ Action │                             │
+                                 └──────────────┬──────────────┘
+                                          State │
+                                        Polling │
+                                 ┌──────────────▼──────────────┐
+                                 │                             │
+                                 │     Visualization UI        │
+                                 │    (FastAPI + HTML/JS)      │
+                                 │  Port 7861                  │
+                                 └─────────────────────────────┘
+```
+## Observation Space
+| Name | Type | Range | Description |
+|------|------|-------|-------------|
+| `indoor_temperature` | float | [15.0, 30.0] | Current indoor temperature (°C). Goal is usually 21°C. |
+| `thermal_storage_level` | float | [0.0, 1.0] | Thermal storage capacity fill level. |
+| `process_demand` | float | [0.0, 50.0] | Current uncontrolled process power demand (kW). |
+| `current_price` | float | [0.02, 0.50] | Real-time electricity price ($/kWh). |
+| `grid_stress_signal` | float | [0.0, 1.0] | Utility signal indicating grid stress. >0.7 requires shedding. |
+| `carbon_intensity` | float | [100, 700] | Grid carbon emissions intensity (gCO2/kWh). |
+| `hour_of_day` | int | [0, 23] | Current hour, useful for scheduling. |
+| `batch_queue` | list[int] | N/A | List of deadline slots for pending batch jobs. |
+| `cumulative_cost` | float | [0.0, inf) | Running energy cost in $. |
+| `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
+| `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
+## Action Space
+| Name | Type | Range | Description |
+|------|------|-------|-------------|
+| `hvac_power_level` | float | [0.0, 1.0] | Fraction of max HVAC cooling/heating power to apply. |
+| `thermal_charge_rate` | float | [-1.0, 1.0] | Charge (positive) or discharge (negative) thermal storage. |
+| `batch_job_slot` | int | [0, 4] | Delay scheduling the next batch job by 0-4 time slots. |
+| `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
+| `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
+## Tasks
+GridMind-RL features 3 progressively difficult tasks:
+1. **Task 1: Cost Minimization (Easy)**
+   Minimize total energy costs by moving load to off-peak periods using thermal storage. No temperature constraints.
+2. **Task 2: Temperature Management (Medium)**
+   Minimize costs while keeping indoor temperatures strictly within 19°C – 23°C.
+3. **Task 3: Full Demand Response (Hard)**
+   Minimize cost, maintain temperature, successfully schedule batch jobs before deadlines, and shed loads when the grid stress signal exceeds 0.7.
+## Reward Function
+The dense reward includes several components:
+* **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
+* **Temp Constraint:** Gaussian bonus for being close to the setpoint, harsh penalty for exiting [19°C, 23°C].
+* **Grid Response:** Large bonus if `load_shed_fraction` > 0 when `grid_stress_signal` > 0.7.
+* **Deadline Penalty:** Heavy negative reward for jobs that execute past their deadline slot.
+* **Efficiency Bonus:** Rewards charging thermal storage when the current price is *below* the future moving average.
+* **Stability Penalty:** Penalizes rapid oscillation of the HVAC and storage controls.
+*Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
+## Usage
+### Local Docker Build
+```bash
+docker build -t gridmind-rl .
+docker run -p 7860:7860 -p 7861:7861 gridmind-rl
+```
+* Backend OpenEnv server: http://localhost:7860
+* Visualization Dashboard: http://localhost:7861
+### Validating the Environment
+```bash
+python python/validate.py --env-url http://localhost:7860
+```
+### Running Baseline Inference
+```bash
+export API_BASE_URL=https://api-inference.huggingface.co/v1
+export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
+export HF_TOKEN=your_token
+# Install dependencies
+pip install -r python/requirements.txt
+# Run inference
+python python/inference.py --episodes 3
+```
+## Extensions
+* **Multi-building mode:** Switch the environment to 3 buildings via `POST /reset {"num_buildings": 3}` and output action arrays for coordinated dispatch.
+* **Add new tasks:** Edit `env/tasks.go` and implement a new `gradeTaskX` component.

dashboard/server.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""
+GridMind-RL Visualization Dashboard (FastAPI)
+Serves the real-time simulation dashboard at port 7861.
+Proxies /api/* to the Go environment server at port 7860.
+"""
+import os
+from pathlib import Path
+import httpx
+from fastapi import FastAPI, Request
+from fastapi.responses import HTMLResponse, JSONResponse
+from fastapi.staticfiles import StaticFiles
+app = FastAPI(title="GridMind-RL Dashboard", version="1.0.0")
+ENV_SERVER = os.getenv("ENV_SERVER_URL", "http://localhost:7860")
+STATIC_DIR = Path(__file__).parent / "static"
+# Serve static files (index.html, dashboard.js)
+app.mount("/static", StaticFiles(directory=str(STATIC_DIR)), name="static")
+@app.get("/", response_class=HTMLResponse)
+async def index():
+    """Serve the dashboard HTML."""
+    with open(STATIC_DIR / "index.html", "r", encoding="utf-8") as f:
+        return HTMLResponse(content=f.read())
+@app.get("/api/{path:path}")
+async def proxy_get(path: str, request: Request):
+    """Proxy GET requests to the Go environment server."""
+    url = f"{ENV_SERVER}/{path}"
+    params = dict(request.query_params)
+    async with httpx.AsyncClient(timeout=10.0) as client:
+        try:
+            r = await client.get(url, params=params)
+            return JSONResponse(content=r.json(), status_code=r.status_code)
+        except Exception as e:
+            return JSONResponse({"error": str(e)}, status_code=503)
+@app.post("/api/{path:path}")
+async def proxy_post(path: str, request: Request):
+    """Proxy POST requests to the Go environment server."""
+    url = f"{ENV_SERVER}/{path}"
+    body = await request.body()
+    async with httpx.AsyncClient(timeout=10.0) as client:
+        try:
+            r = await client.post(url, content=body,
+                                   headers={"Content-Type": "application/json"})
+            return JSONResponse(content=r.json(), status_code=r.status_code)
+        except Exception as e:
+            return JSONResponse({"error": str(e)}, status_code=503)
+@app.get("/health")
+async def health():
+    return {"status": "ok", "dashboard": "gridmind-rl"}
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.getenv("DASHBOARD_PORT", "7861"))
+    uvicorn.run(app, host="0.0.0.0", port=port, log_level="info")

dashboard/static/dashboard.js ADDED Viewed

	@@ -0,0 +1,617 @@

+/**
+ * GridMind-RL Dashboard — Chart.js real-time visualization
+ * Polls /api/state every 500ms and updates all charts.
+ */
+'use strict';
+// ── Config ──────────────────────────────────────────────────────────────────
+const POLL_MS        = 500;
+const HISTORY_LEN    = 96;   // 96 steps = full episode
+const API_BASE       = '/api';
+const TASK_NAMES = {
+  1: 'Task 1 — Cost Minimization (Easy)',
+  2: 'Task 2 — Temperature Management (Medium)',
+  3: 'Task 3 — Full Demand Response (Hard)',
+};
+let currentBuilding = 0;
+let pollTimer = null;
+let connected = false;
+// ── Chart.js global defaults ─────────────────────────────────────────────────
+Chart.defaults.color = '#8899b4';
+Chart.defaults.borderColor = 'rgba(56,139,253,0.1)';
+Chart.defaults.font.family = "'Inter', sans-serif";
+Chart.defaults.font.size = 11;
+Chart.defaults.plugins.legend.display = false;
+Chart.defaults.animation.duration = 300;
+const COLORS = {
+  blue:   '#388bfd',
+  green:  '#3fb950',
+  amber:  '#d29922',
+  red:    '#f85149',
+  purple: '#bc8cff',
+  cyan:   '#39d0d8',
+  orange: '#ff7c39',
+  dimBlue: 'rgba(56,139,253,0.15)',
+};
+function rgba(hex, alpha) {
+  const r = parseInt(hex.slice(1,3), 16);
+  const g = parseInt(hex.slice(3,5), 16);
+  const b = parseInt(hex.slice(5,7), 16);
+  return `rgba(${r},${g},${b},${alpha})`;
+}
+// ── Chart factory helpers ────────────────────────────────────────────────────
+function makeLineChart(id, labels, datasets, opts = {}) {
+  const ctx = document.getElementById(id).getContext('2d');
+  return new Chart(ctx, {
+    type: 'line',
+    data: { labels, datasets },
+    options: {
+      responsive: true,
+      maintainAspectRatio: false,
+      interaction: { mode: 'index', intersect: false },
+      scales: {
+        x: { grid: { color: 'rgba(56,139,253,0.06)' }, ticks: { maxTicksLimit: 8 } },
+        y: { grid: { color: 'rgba(56,139,253,0.06)' }, ...opts.yAxis },
+      },
+      plugins: {
+        legend: { display: opts.legend || false },
+        tooltip: { backgroundColor: '#0f1829', borderColor: 'rgba(56,139,253,0.3)', borderWidth: 1 },
+      },
+      ...opts.extra,
+    },
+  });
+}
+function makeAreaChart(id, labels, datasets) {
+  return makeLineChart(id, labels, datasets, {
+    extra: { fill: true },
+  });
+}
+function makeBarChart(id, labels, datasets) {
+  const ctx = document.getElementById(id).getContext('2d');
+  return new Chart(ctx, {
+    type: 'bar',
+    data: { labels, datasets },
+    options: {
+      responsive: true,
+      maintainAspectRatio: false,
+      scales: {
+        x: { stacked: true, grid: { color: 'rgba(56,139,253,0.06)' }, ticks: { maxTicksLimit: 8 } },
+        y: { stacked: true, grid: { color: 'rgba(56,139,253,0.06)' } },
+      },
+      plugins: {
+        legend: { display: true, position: 'bottom', labels: { usePointStyle: true, padding: 10 } },
+        tooltip: { backgroundColor: '#0f1829', borderColor: 'rgba(56,139,253,0.3)', borderWidth: 1 },
+      },
+    },
+  });
+}
+// ── Initialise all charts ─────────────────────────────────────────────────────
+const emptyLabels = Array.from({ length: 24 }, (_, i) => `${i}h`);
+const emptyData   = Array(24).fill(null);
+// 1. Price curve
+const priceChart = makeLineChart('chart-price',
+  emptyLabels,
+  [
+    {
+      label: 'Price ($/kWh)',
+      data: [...emptyData],
+      borderColor: COLORS.amber,
+      backgroundColor: rgba(COLORS.amber, 0.15),
+      borderWidth: 2,
+      fill: true,
+      tension: 0.4,
+      pointRadius: 0,
+    },
+    {
+      label: 'Current',
+      data: [...emptyData],
+      borderColor: COLORS.red,
+      backgroundColor: 'transparent',
+      borderWidth: 0,
+      pointRadius: 6,
+      pointBackgroundColor: COLORS.red,
+    },
+  ],
+  { legend: true, yAxis: { title: { display: true, text: '$/kWh' } } }
+);
+// 2. Temperature
+const tempChart = makeLineChart('chart-temp',
+  [],
+  [
+    {
+      label: 'Indoor Temp (°C)',
+      data: [],
+      borderColor: COLORS.cyan,
+      backgroundColor: rgba(COLORS.cyan, 0.1),
+      borderWidth: 2,
+      fill: true,
+      tension: 0.4,
+      pointRadius: 0,
+    },
+    {
+      label: 'T_max (23°C)',
+      data: [],
+      borderColor: rgba(COLORS.red, 0.5),
+      borderWidth: 1,
+      borderDash: [5, 5],
+      pointRadius: 0,
+      fill: false,
+    },
+    {
+      label: 'T_min (19°C)',
+      data: [],
+      borderColor: rgba(COLORS.blue, 0.5),
+      borderWidth: 1,
+      borderDash: [5, 5],
+      pointRadius: 0,
+      fill: false,
+    },
+  ],
+  { legend: true, yAxis: { suggestedMin: 15, suggestedMax: 30, title: { display: true, text: '°C' } } }
+);
+// 3. Storage history (mini)
+const storageChart = makeLineChart('chart-storage',
+  [],
+  [{
+    label: 'Storage Level',
+    data: [],
+    borderColor: COLORS.cyan,
+    backgroundColor: rgba(COLORS.cyan, 0.2),
+    borderWidth: 2,
+    fill: true,
+    tension: 0.4,
+    pointRadius: 0,
+  }],
+  { yAxis: { min: 0, max: 1 } }
+);
+// 4. HVAC + Load Shed stacked area
+const hvacChart = makeBarChart('chart-hvac',
+  [],
+  [
+    {
+      label: 'HVAC Power',
+      data: [],
+      backgroundColor: rgba(COLORS.blue, 0.7),
+      borderColor: COLORS.blue,
+      borderWidth: 1,
+    },
+    {
+      label: 'Load Shed',
+      data: [],
+      backgroundColor: rgba(COLORS.red, 0.7),
+      borderColor: COLORS.red,
+      borderWidth: 1,
+    },
+  ]
+);
+// 5. Cumulative cost vs baseline
+const costChart = makeLineChart('chart-cost',
+  [],
+  [
+    {
+      label: 'Agent Cost ($)',
+      data: [],
+      borderColor: COLORS.green,
+      backgroundColor: rgba(COLORS.green, 0.1),
+      borderWidth: 2,
+      fill: true,
+      tension: 0.4,
+      pointRadius: 0,
+    },
+    {
+      label: 'Baseline ($)',
+      data: [],
+      borderColor: rgba(COLORS.amber, 0.7),
+      borderDash: [6, 3],
+      borderWidth: 2,
+      fill: false,
+      tension: 0.4,
+      pointRadius: 0,
+    },
+  ],
+  { legend: true, yAxis: { title: { display: true, text: '$' } } }
+);
+// 6. Grid stress history (mini)
+const stressChart = makeLineChart('chart-stress',
+  [],
+  [{
+    label: 'Grid Stress',
+    data: [],
+    borderColor: COLORS.red,
+    backgroundColor: rgba(COLORS.red, 0.2),
+    borderWidth: 2,
+    fill: true,
+    tension: 0.4,
+    pointRadius: 0,
+  }],
+  { yAxis: { min: 0, max: 1 } }
+);
+// 7. Carbon curve
+const carbonChart = makeLineChart('chart-carbon',
+  emptyLabels,
+  [{
+    label: 'Carbon Intensity (gCO₂/kWh)',
+    data: [...emptyData],
+    borderColor: COLORS.orange,
+    backgroundColor: rgba(COLORS.orange, 0.15),
+    borderWidth: 2,
+    fill: true,
+    tension: 0.4,
+    pointRadius: 0,
+  }],
+  { yAxis: { title: { display: true, text: 'gCO₂/kWh' } } }
+);
+// 8. Reward components bar
+const rewardChart = makeBarChart('chart-reward',
+  [],
+  [
+    { label: 'Cost Savings',   data: [], backgroundColor: rgba(COLORS.green, 0.8) },
+    { label: 'Temp Constraint',data: [], backgroundColor: rgba(COLORS.cyan, 0.8) },
+    { label: 'Grid Response',  data: [], backgroundColor: rgba(COLORS.blue, 0.8) },
+    { label: 'Efficiency',     data: [], backgroundColor: rgba(COLORS.purple, 0.7) },
+    { label: 'Penalties',      data: [], backgroundColor: rgba(COLORS.red, 0.8) },
+  ]
+);
+// ── Stress meter bars ────────────────────────────────────────────────────────
+function buildStressMeter() {
+  const el = document.getElementById('stress-meter');
+  el.innerHTML = '';
+  for (let i = 0; i < 20; i++) {
+    const bar = document.createElement('div');
+    bar.className = 'stress-bar';
+    bar.id = `sm-${i}`;
+    el.appendChild(bar);
+  }
+}
+buildStressMeter();
+function updateStressMeter(stress) {
+  const bars = 20;
+  const active = Math.round(stress * bars);
+  for (let i = 0; i < bars; i++) {
+    const bar = document.getElementById(`sm-${i}`);
+    if (!bar) continue;
+    const pct = (i / bars) * 100;
+    bar.style.height = `${20 + pct * 0.8}%`;
+    if (i < active) {
+      const color = stress > 0.7 ? COLORS.red : stress > 0.4 ? COLORS.amber : COLORS.green;
+      bar.style.background = color;
+      bar.style.opacity = '1';
+    } else {
+      bar.style.background = 'rgba(255,255,255,0.05)';
+      bar.style.opacity = '1';
+    }
+  }
+}
+// ── Batch Gantt renderer ─────────────────────────────────────────────────────
+function renderGantt(jobs, currentStep) {
+  const wrap = document.getElementById('gantt-wrap');
+  if (!jobs || jobs.length === 0) {
+    wrap.innerHTML = '<div style="color:var(--text-dim);font-size:0.8rem">No batch jobs in this episode.</div>';
+    return;
+  }
+  const totalSlots = 96;
+  wrap.innerHTML = '';
+  jobs.forEach(job => {
+    const row = document.createElement('div');
+    row.className = 'gantt-row';
+    const label = document.createElement('div');
+    label.className = 'gantt-label';
+    label.textContent = `J${job.id}`;
+    row.appendChild(label);
+    const track = document.createElement('div');
+    track.className = 'gantt-track';
+    // Deadline marker
+    const deadlinePct = (job.deadline_slot / totalSlots) * 100;
+    const deadline = document.createElement('div');
+    deadline.className = 'gantt-deadline';
+    deadline.style.left = `${deadlinePct}%`;
+    deadline.title = `Deadline: step ${job.deadline_slot}`;
+    track.appendChild(deadline);
+    // Job block
+    if (job.scheduled) {
+      const startPct = (job.scheduled_at / totalSlots) * 100;
+      const widthPct = (job.duration / totalSlots) * 100;
+      const block = document.createElement('div');
+      block.className = 'gantt-block ' + (job.completed ? 'completed' : job.missed_deadline ? 'missed' : 'scheduled');
+      block.style.left = `${startPct}%`;
+      block.style.width = `${Math.max(widthPct, 1)}%`;
+      track.appendChild(block);
+    }
+    // Current step marker
+    const curPct = (currentStep / totalSlots) * 100;
+    const curMarker = document.createElement('div');
+    curMarker.style.cssText = `position:absolute;top:0;bottom:0;width:1px;background:rgba(56,139,253,0.6);left:${curPct}%`;
+    track.appendChild(curMarker);
+    row.appendChild(track);
+    // Status badge
+    const statusWrap = document.createElement('div');
+    statusWrap.className = 'gantt-status';
+    let badgeClass = 'pending', badgeText = 'pending';
+    if (job.completed)       { badgeClass = 'ok';      badgeText = 'done'; }
+    else if (job.missed_deadline) { badgeClass = 'missed';  badgeText = 'missed'; }
+    else if (job.scheduled && !job.completed) { badgeClass = 'running'; badgeText = 'running'; }
+    statusWrap.innerHTML = `<span class="badge ${badgeClass}">${badgeText}</span>`;
+    row.appendChild(statusWrap);
+    wrap.appendChild(row);
+  });
+}
+// ── Reward breakdown rows ─────────────────────────────────────────────────────
+function renderRewardRows(rc) {
+  if (!rc) return;
+  const container = document.getElementById('reward-rows');
+  const components = [
+    { key: 'cost_savings',      label: 'Cost Savings',  color: COLORS.green,  sign: 1 },
+    { key: 'temp_constraint',   label: 'Temp Constr.',  color: COLORS.cyan,   sign: 1 },
+    { key: 'grid_response',     label: 'Grid DR',       color: COLORS.blue,   sign: 1 },
+    { key: 'efficiency_bonus',  label: 'Efficiency',    color: COLORS.purple, sign: 1 },
+    { key: 'stability_penalty', label: 'Stability',     color: COLORS.amber,  sign: -1 },
+    { key: 'deadline_penalty',  label: 'Deadlines',     color: COLORS.red,    sign: -1 },
+    { key: 'carbon_reward',     label: 'Carbon',        color: COLORS.orange, sign: 1 },
+  ];
+  container.innerHTML = '';
+  components.forEach(c => {
+    const val = rc[c.key] || 0;
+    const absVal = Math.abs(val);
+    const pct = Math.min(100, absVal * 30); // scale 0–~3 reward to 0–100%
+    container.innerHTML += `
+      <div class="reward-row">
+        <div class="reward-label">${c.label}</div>
+        <div class="reward-bar-wrap">
+          <div class="reward-bar" style="width:${pct}%;background:${c.color};opacity:0.8"></div>
+        </div>
+        <div class="reward-val" style="color:${val >= 0 ? COLORS.green : COLORS.red}">${val.toFixed(3)}</div>
+      </div>`;
+  });
+}
+// ── KPI color logic ──────────────────────────────────────────────────────────
+function colorClass(val, good, bad) {
+  if (val <= good) return 'good';
+  if (val >= bad)  return 'bad';
+  return 'warn';
+}
+// ── Main state update ─────────────────────────────────────────────────────────
+let lastStep = -1;
+async function fetchAndUpdate() {
+  try {
+    const res = await fetch(`${API_BASE}/state`);
+    if (!res.ok) throw new Error(`HTTP ${res.status}`);
+    const state = await res.json();
+    connected = true;
+    document.getElementById('conn-banner').classList.remove('show');
+    document.getElementById('status-dot').style.background = 'var(--accent-green)';
+    const b = state.buildings && state.buildings[currentBuilding];
+    if (!b) return;
+    const step = state.step;
+    const hourOfDay = b.hour_of_day || 0;
+    // ── Header ──
+    document.getElementById('ep-step').textContent = `ep:${state.episode} step:${step}/95`;
+    document.getElementById('task-badge').textContent = TASK_NAMES[state.task_id] || 'Task 1';
+    // ── KPIs ──
+    const priceEl = document.getElementById('kpi-price');
+    priceEl.textContent = `$${b.current_price.toFixed(4)}`;
+    priceEl.className = 'kpi-value ' + colorClass(b.current_price, 0.08, 0.16);
+    const tempEl = document.getElementById('kpi-temp');
+    tempEl.textContent = `${b.indoor_temperature.toFixed(1)}°C`;
+    const inBounds = b.indoor_temperature >= 19 && b.indoor_temperature <= 23;
+    tempEl.className = 'kpi-value ' + (inBounds ? 'good' : 'bad');
+    const stressEl = document.getElementById('kpi-stress');
+    stressEl.textContent = b.grid_stress_signal.toFixed(3);
+    stressEl.className = 'kpi-value ' + colorClass(b.grid_stress_signal, 0.4, 0.7);
+    const costEl = document.getElementById('kpi-cost');
+    const savings = b.baseline_cost - b.cumulative_cost;
+    costEl.textContent = `$${b.cumulative_cost.toFixed(2)}`;
+    costEl.className = 'kpi-value ' + (savings > 0 ? 'good' : 'warn');
+    document.getElementById('kpi-baseline').textContent = `$${b.baseline_cost.toFixed(2)}`;
+    document.getElementById('kpi-carbon').textContent = `${b.carbon_intensity.toFixed(0)}`;
+    document.getElementById('kpi-demand').textContent = `${b.process_demand.toFixed(1)}`;
+    document.getElementById('kpi-storage').textContent = `${(b.thermal_storage_level * 100).toFixed(1)}`;
+    // ── Price curve chart ──
+    if (state.price_curve_24h && state.price_curve_24h.length === 24) {
+      const labels = Array.from({ length: 24 }, (_, i) => `${i}:00`);
+      priceChart.data.labels = labels;
+      priceChart.data.datasets[0].data = state.price_curve_24h;
+      // Current position marker
+      const marker = Array(24).fill(null);
+      marker[hourOfDay] = state.price_curve_24h[hourOfDay];
+      priceChart.data.datasets[1].data = marker;
+      priceChart.update('none');
+    }
+    // ── Carbon curve ──
+    if (state.carbon_curve_24h && state.carbon_curve_24h.length === 24) {
+      carbonChart.data.labels = Array.from({ length: 24 }, (_, i) => `${i}:00`);
+      carbonChart.data.datasets[0].data = state.carbon_curve_24h;
+      carbonChart.update('none');
+    }
+    // ── Grid stress ──
+    document.getElementById('stress-big').textContent = b.grid_stress_signal.toFixed(3);
+    updateStressMeter(b.grid_stress_signal);
+    const cardStress = document.getElementById('card-stress');
+    if (b.grid_stress_signal > 0.7) {
+      cardStress.classList.add('alert-active');
+    } else {
+      cardStress.classList.remove('alert-active');
+    }
+    // ── Thermal storage bar ──
+    const storagePct = (b.thermal_storage_level * 100).toFixed(1);
+    document.getElementById('storage-pct').textContent = storagePct;
+    document.getElementById('storage-fill').style.width = `${storagePct}%`;
+    // ── History-based charts (only update when step changes) ──
+    if (step !== lastStep) {
+      lastStep = step;
+      const stepLabels = Array.from({ length: b.temp_history.length }, (_, i) => i);
+      // Temperature chart
+      if (b.temp_history.length > 0) {
+        tempChart.data.labels = stepLabels;
+        tempChart.data.datasets[0].data = b.temp_history;
+        tempChart.data.datasets[1].data = b.temp_history.map(() => 23);
+        tempChart.data.datasets[2].data = b.temp_history.map(() => 19);
+        tempChart.update('none');
+      }
+      // Storage history
+      if (b.hvac_history && b.hvac_history.length > 0) {
+        storageChart.data.labels = stepLabels;
+        storageChart.data.datasets[0].data = Array.from({ length: b.hvac_history.length }, (_, i) =>
+          b.thermal_storage_level // simplify: use current level as placeholder
+        );
+        storageChart.update('none');
+      }
+      // HVAC + load shed (bar)
+      if (b.hvac_history && b.load_shed_history) {
+        const n = Math.min(b.hvac_history.length, HISTORY_LEN);
+        hvacChart.data.labels = Array.from({ length: n }, (_, i) => i);
+        hvacChart.data.datasets[0].data = b.hvac_history.slice(0, n);
+        hvacChart.data.datasets[1].data = b.load_shed_history.slice(0, n);
+        hvacChart.update('none');
+      }
+      // Cost vs baseline
+      if (b.cost_history && b.cost_history.length > 0) {
+        const n = b.cost_history.length;
+        costChart.data.labels = Array.from({ length: n }, (_, i) => i);
+        costChart.data.datasets[0].data = b.cost_history;
+        // Generate approximate baseline curve (linear ramp to b.baseline_cost)
+        const baselineStep = b.baseline_cost / Math.max(step, 1);
+        costChart.data.datasets[1].data = b.cost_history.map((_, i) => baselineStep * (i + 1));
+        costChart.update('none');
+      }
+      // Grid stress history
+      if (b.reward_history && b.reward_history.length > 0) {
+        const n = b.reward_history.length;
+        stressChart.data.labels = Array.from({ length: n }, (_, i) => i);
+        stressChart.data.datasets[0].data = b.reward_history.map(r => Math.max(0, r.grid_response || 0));
+        stressChart.update('none');
+        // Reward breakdown chart (last 20 steps)
+        const recent = b.reward_history.slice(-20);
+        rewardChart.data.labels = Array.from({ length: recent.length }, (_, i) => n - recent.length + i);
+        rewardChart.data.datasets[0].data = recent.map(r => Math.max(0, r.cost_savings || 0));
+        rewardChart.data.datasets[1].data = recent.map(r => Math.max(0, r.temp_constraint || 0));
+        rewardChart.data.datasets[2].data = recent.map(r => Math.max(0, r.grid_response || 0));
+        rewardChart.data.datasets[3].data = recent.map(r => Math.max(0, r.efficiency_bonus || 0));
+        rewardChart.data.datasets[4].data = recent.map(r =>
+          Math.abs(r.deadline_penalty || 0) + Math.abs(r.stability_penalty || 0)
+        );
+        rewardChart.update('none');
+        // Reward rows (last step)
+        renderRewardRows(b.reward_history[b.reward_history.length - 1]);
+      }
+      // Batch Gantt
+      renderGantt(b.jobs || [], step);
+    }
+  } catch (err) {
+    connected = false;
+    document.getElementById('conn-banner').classList.add('show');
+    document.getElementById('status-dot').style.background = 'var(--accent-red)';
+    // console.error('Poll error:', err);
+  }
+}
+// ── Episode controls ─────────────────────────────────────────────────────────
+async function doReset() {
+  const taskId = parseInt(document.getElementById('task-select').value, 10);
+  const btn = document.getElementById('btn-reset');
+  btn.textContent = 'Resetting...';
+  btn.disabled = true;
+  lastStep = -1;
+  try {
+    await fetch(`${API_BASE}/reset`, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({ task_id: taskId, num_buildings: 1 }),
+    });
+  } catch (e) {
+    console.error(e);
+  }
+  btn.textContent = '↺ New Episode';
+  btn.disabled = false;
+  document.getElementById('grade-result').textContent = '';
+}
+async function doGrade() {
+  try {
+    const res = await fetch(`${API_BASE}/grade`);
+    const grade = await res.json();
+    const score = (grade.score * 100).toFixed(2);
+    const el = document.getElementById('grade-result');
+    el.textContent = `Score: ${score}% ${grade.exploit_detected ? '⚠ exploit!' : ''}`;
+    el.style.color = grade.score > 0.6 ? 'var(--accent-green)' : grade.score > 0.3 ? 'var(--accent-amber)' : 'var(--accent-red)';
+  } catch (e) {
+    console.error(e);
+  }
+}
+function onTaskChange() {
+  // Reset chart histories on task change
+  [tempChart, storageChart, hvacChart, costChart, stressChart, rewardChart].forEach(c => {
+    c.data.labels = [];
+    c.data.datasets.forEach(d => d.data = []);
+    c.update('none');
+  });
+}
+function onBuildingChange() {
+  currentBuilding = parseInt(document.getElementById('building-select').value, 10);
+  lastStep = -1;
+}
+// ── Start polling ────────────────────────────────────────────────────────────
+function startPolling() {
+  if (pollTimer) clearInterval(pollTimer);
+  fetchAndUpdate(); // immediate first fetch
+  pollTimer = setInterval(fetchAndUpdate, POLL_MS);
+}
+startPolling();

dashboard/static/index.html ADDED Viewed

	@@ -0,0 +1,627 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+  <title>GridMind-RL | Real-Time Energy Dashboard</title>
+  <meta name="description" content="Real-time visualization dashboard for the GridMind-RL Industrial Load-Shaping and Demand-Response RL environment." />
+  <!-- Chart.js CDN -->
+  <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.0/dist/chart.umd.min.js"></script>
+  <!-- Google Fonts -->
+  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet" />
+  <style>
+    /* ── Design System ─────────────────────────────────────────────── */
+    :root {
+      --bg-base:      #0a0f1e;
+      --bg-surface:   #0f1829;
+      --bg-card:      #141f35;
+      --bg-card-h:    #1a2840;
+      --border:       rgba(56, 139, 253, 0.15);
+      --border-glow:  rgba(56, 139, 253, 0.4);
+      --text-primary: #e2e8f4;
+      --text-secondary: #8899b4;
+      --text-dim:     #4d6080;
+      --accent-blue:  #388bfd;
+      --accent-green: #3fb950;
+      --accent-amber: #d29922;
+      --accent-red:   #f85149;
+      --accent-purple: #bc8cff;
+      --accent-cyan:  #39d0d8;
+      --accent-orange: #ff7c39;
+      --gradient-hero: linear-gradient(135deg, #0d1b33 0%, #0a0f1e 100%);
+      --glow-blue:    0 0 20px rgba(56,139,253,0.25), 0 0 40px rgba(56,139,253,0.1);
+      --glow-green:   0 0 20px rgba(63,185,80,0.25);
+      --font-mono:    'JetBrains Mono', monospace;
+      --radius:       12px;
+      --radius-lg:    16px;
+    }
+    *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
+    html { scroll-behavior: smooth; }
+    body {
+      font-family: 'Inter', sans-serif;
+      background: var(--bg-base);
+      color: var(--text-primary);
+      min-height: 100vh;
+      overflow-x: hidden;
+    }
+    /* ── Animated background grid ── */
+    body::before {
+      content: '';
+      position: fixed;
+      inset: 0;
+      background-image:
+        linear-gradient(rgba(56,139,253,0.03) 1px, transparent 1px),
+        linear-gradient(90deg, rgba(56,139,253,0.03) 1px, transparent 1px);
+      background-size: 40px 40px;
+      pointer-events: none;
+      z-index: 0;
+    }
+    /* ── Header ─────────────────────────────────────────────────────── */
+    header {
+      position: sticky;
+      top: 0;
+      z-index: 100;
+      background: rgba(10,15,30,0.85);
+      backdrop-filter: blur(16px);
+      border-bottom: 1px solid var(--border);
+      padding: 0 2rem;
+      height: 64px;
+      display: flex;
+      align-items: center;
+      justify-content: space-between;
+    }
+    .logo {
+      display: flex;
+      align-items: center;
+      gap: 10px;
+    }
+    .logo-icon {
+      width: 32px; height: 32px;
+      background: linear-gradient(135deg, var(--accent-blue), var(--accent-cyan));
+      border-radius: 8px;
+      display: flex; align-items: center; justify-content: center;
+      font-size: 16px;
+    }
+    .logo-text { font-size: 1.1rem; font-weight: 700; letter-spacing: -0.3px; }
+    .logo-text span { color: var(--accent-blue); }
+    .header-status {
+      display: flex;
+      align-items: center;
+      gap: 1.5rem;
+    }
+    .status-dot {
+      width: 8px; height: 8px;
+      border-radius: 50%;
+      background: var(--accent-green);
+      box-shadow: 0 0 8px var(--accent-green);
+      animation: pulse 2s infinite;
+    }
+    @keyframes pulse {
+      0%, 100% { opacity: 1; transform: scale(1); }
+      50% { opacity: 0.6; transform: scale(0.9); }
+    }
+    .status-label { font-size: 0.8rem; color: var(--text-secondary); }
+    .task-badge {
+      padding: 4px 12px;
+      border-radius: 20px;
+      font-size: 0.75rem;
+      font-weight: 600;
+      background: rgba(56,139,253,0.15);
+      border: 1px solid var(--border);
+      color: var(--accent-blue);
+    }
+    /* ── KPI Bar ────────────────────────────────────────────────────── */
+    .kpi-bar {
+      position: relative; z-index: 1;
+      display: grid;
+      grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
+      gap: 1px;
+      background: var(--border);
+      border-bottom: 1px solid var(--border);
+    }
+    .kpi {
+      background: var(--bg-surface);
+      padding: 1rem 1.5rem;
+      display: flex; flex-direction: column; gap: 2px;
+      transition: background 0.2s;
+    }
+    .kpi:hover { background: var(--bg-card); }
+    .kpi-label { font-size: 0.7rem; text-transform: uppercase; letter-spacing: 0.8px; color: var(--text-dim); }
+    .kpi-value {
+      font-family: var(--font-mono);
+      font-size: 1.5rem;
+      font-weight: 600;
+      color: var(--text-primary);
+      transition: color 0.3s;
+    }
+    .kpi-value.good { color: var(--accent-green); }
+    .kpi-value.warn { color: var(--accent-amber); }
+    .kpi-value.bad  { color: var(--accent-red); }
+    .kpi-delta { font-size: 0.72rem; color: var(--text-secondary); font-family: var(--font-mono); }
+    /* ── Main Grid ──────────────────────────────────────────────────── */
+    main {
+      position: relative; z-index: 1;
+      max-width: 1600px;
+      margin: 0 auto;
+      padding: 1.5rem;
+      display: grid;
+      grid-template-columns: repeat(12, 1fr);
+      gap: 1rem;
+    }
+    /* ── Card ───────────────────────────────────────────────────────── */
+    .card {
+      background: var(--bg-card);
+      border: 1px solid var(--border);
+      border-radius: var(--radius-lg);
+      padding: 1.25rem;
+      transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s;
+      position: relative;
+      overflow: hidden;
+    }
+    .card::before {
+      content: '';
+      position: absolute;
+      inset: 0;
+      background: linear-gradient(135deg, rgba(56,139,253,0.03) 0%, transparent 60%);
+      pointer-events: none;
+    }
+    .card:hover {
+      border-color: var(--border-glow);
+      box-shadow: var(--glow-blue);
+      transform: translateY(-1px);
+    }
+    .card.alert-active {
+      border-color: rgba(248,81,73,0.5);
+      box-shadow: 0 0 20px rgba(248,81,73,0.2);
+      animation: alertPulse 1.5s infinite;
+    }
+    @keyframes alertPulse {
+      0%, 100% { box-shadow: 0 0 20px rgba(248,81,73,0.2); }
+      50% { box-shadow: 0 0 35px rgba(248,81,73,0.4); }
+    }
+    .card-title {
+      font-size: 0.78rem;
+      font-weight: 600;
+      text-transform: uppercase;
+      letter-spacing: 0.8px;
+      color: var(--text-secondary);
+      margin-bottom: 0.75rem;
+      display: flex;
+      align-items: center;
+      gap: 0.5rem;
+    }
+    .card-title .icon { font-size: 0.9rem; }
+    /* ── Grid layout spans ─ */
+    .col-12 { grid-column: span 12; }
+    .col-8  { grid-column: span 8; }
+    .col-6  { grid-column: span 6; }
+    .col-4  { grid-column: span 4; }
+    .col-3  { grid-column: span 3; }
+    @media (max-width: 1200px) {
+      .col-8  { grid-column: span 12; }
+      .col-4  { grid-column: span 12; }
+      .col-6  { grid-column: span 12; }
+      .col-3  { grid-column: span 6; }
+    }
+    @media (max-width: 768px) {
+      .col-3 { grid-column: span 12; }
+      main { padding: 0.75rem; gap: 0.75rem; }
+    }
+    /* ── Charts ─────────────────────────────────────────────────────── */
+    .chart-wrap { position: relative; height: 200px; }
+    .chart-wrap.tall { height: 260px; }
+    .chart-wrap.short { height: 150px; }
+    /* ── Thermal Storage Bar ─────────────────────────────────────────── */
+    .storage-bar-wrap {
+      height: 28px;
+      background: rgba(255,255,255,0.05);
+      border-radius: 14px;
+      overflow: hidden;
+      margin-top: 0.5rem;
+      position: relative;
+    }
+    .storage-bar-fill {
+      height: 100%;
+      border-radius: 14px;
+      background: linear-gradient(90deg, var(--accent-cyan), var(--accent-blue));
+      transition: width 0.6s cubic-bezier(0.4, 0, 0.2, 1);
+      position: relative;
+    }
+    .storage-bar-fill::after {
+      content: '';
+      position: absolute;
+      inset: 0;
+      background: linear-gradient(90deg, transparent 0%, rgba(255,255,255,0.15) 50%, transparent 100%);
+      animation: shimmer 2s infinite;
+    }
+    @keyframes shimmer {
+      0% { transform: translateX(-100%); }
+      100% { transform: translateX(100%); }
+    }
+    .storage-label {
+      font-family: var(--font-mono);
+      font-size: 1.8rem;
+      font-weight: 700;
+      color: var(--accent-cyan);
+      margin-top: 0.5rem;
+    }
+    .storage-label span { font-size: 1rem; color: var(--text-secondary); }
+    /* ── Batch Gantt ─────────────────────────────────────────────────── */
+    .gantt-wrap {
+      display: flex;
+      flex-direction: column;
+      gap: 6px;
+      margin-top: 0.25rem;
+    }
+    .gantt-row {
+      display: flex;
+      align-items: center;
+      gap: 8px;
+      font-size: 0.75rem;
+    }
+    .gantt-label {
+      width: 40px;
+      color: var(--text-secondary);
+      font-family: var(--font-mono);
+      flex-shrink: 0;
+    }
+    .gantt-track {
+      flex: 1;
+      height: 18px;
+      background: rgba(255,255,255,0.05);
+      border-radius: 4px;
+      position: relative;
+      overflow: hidden;
+    }
+    .gantt-block {
+      position: absolute;
+      top: 0; bottom: 0;
+      border-radius: 4px;
+      transition: width 0.3s, left 0.3s;
+    }
+    .gantt-block.scheduled { background: linear-gradient(90deg, var(--accent-blue), var(--accent-purple)); }
+    .gantt-block.completed { background: var(--accent-green); opacity: 0.7; }
+    .gantt-block.missed    { background: var(--accent-red); opacity: 0.8; }
+    .gantt-deadline {
+      position: absolute;
+      top: 0; bottom: 0;
+      width: 2px;
+      background: var(--accent-amber);
+      border-radius: 1px;
+    }
+    .gantt-status {
+      width: 60px;
+      text-align: right;
+      flex-shrink: 0;
+    }
+    .badge {
+      padding: 2px 8px;
+      border-radius: 10px;
+      font-size: 0.7rem;
+      font-weight: 600;
+    }
+    .badge.ok      { background: rgba(63,185,80,0.2); color: var(--accent-green); }
+    .badge.pending { background: rgba(56,139,253,0.2); color: var(--accent-blue); }
+    .badge.missed  { background: rgba(248,81,73,0.2); color: var(--accent-red); }
+    .badge.running { background: rgba(188,140,255,0.2); color: var(--accent-purple); }
+    /* ── Reward breakdown mini-bars ──────────────────────────────── */
+    .reward-row {
+      display: flex; align-items: center; gap: 8px;
+      font-size: 0.75rem; margin-bottom: 4px;
+    }
+    .reward-label { width: 100px; color: var(--text-secondary); }
+    .reward-bar-wrap { flex: 1; height: 10px; background: rgba(255,255,255,0.05); border-radius: 5px; overflow: hidden; }
+    .reward-bar { height: 100%; border-radius: 5px; transition: width 0.5s; }
+    .reward-val { width: 55px; text-align: right; font-family: var(--font-mono); color: var(--text-primary); }
+    /* ── Grid stress indicator ──────────────────────────────────── */
+    .stress-meter {
+      display: flex; align-items: flex-end; gap: 3px;
+      height: 40px;
+      margin-top: 0.5rem;
+    }
+    .stress-bar {
+      flex: 1;
+      background: rgba(255,255,255,0.05);
+      border-radius: 2px 2px 0 0;
+      transition: height 0.4s, background 0.4s;
+    }
+    /* ── Big number display ─────────────────────────────────────── */
+    .big-num {
+      font-family: var(--font-mono);
+      font-weight: 700;
+    }
+    .big-num.xl  { font-size: 2.8rem; }
+    .big-num.lg  { font-size: 2rem; }
+    .big-num.md  { font-size: 1.4rem; }
+    .big-num.green  { color: var(--accent-green); }
+    .big-num.blue   { color: var(--accent-blue); }
+    .big-num.amber  { color: var(--accent-amber); }
+    .big-num.red    { color: var(--accent-red); }
+    .big-num.purple { color: var(--accent-purple); }
+    .sub-label { font-size: 0.75rem; color: var(--text-secondary); margin-top: 2px; }
+    /* ── Price ticker ──────────────────────────────────────────── */
+    .price-row {
+      display: flex; align-items: baseline; gap: 8px; flex-wrap: wrap;
+    }
+    .price-tier {
+      display: flex; align-items: center; gap: 6px;
+      padding: 6px 10px;
+      border-radius: 8px;
+      background: rgba(255,255,255,0.04);
+      border: 1px solid rgba(255,255,255,0.06);
+      font-size: 0.8rem;
+    }
+    .price-tier .dot { width: 8px; height: 8px; border-radius: 50%; }
+    /* ── Control panel ──────────────────────────────────────────── */
+    .ctrl-row {
+      display: flex; gap: 8px; align-items: center; flex-wrap: wrap;
+      margin-top: 0.5rem;
+    }
+    .btn {
+      padding: 8px 16px;
+      border-radius: 8px;
+      border: 1px solid var(--border);
+      background: rgba(56,139,253,0.1);
+      color: var(--accent-blue);
+      font-size: 0.82rem;
+      font-weight: 600;
+      cursor: pointer;
+      transition: all 0.2s;
+      font-family: 'Inter', sans-serif;
+    }
+    .btn:hover {
+      background: rgba(56,139,253,0.2);
+      border-color: var(--accent-blue);
+      box-shadow: 0 0 12px rgba(56,139,253,0.3);
+      transform: translateY(-1px);
+    }
+    .btn.primary {
+      background: var(--accent-blue);
+      color: #fff;
+    }
+    .btn.primary:hover { background: #4da3ff; }
+    .btn.danger {
+      background: rgba(248,81,73,0.15);
+      color: var(--accent-red);
+      border-color: rgba(248,81,73,0.3);
+    }
+    select {
+      padding: 8px 12px;
+      background: var(--bg-card);
+      border: 1px solid var(--border);
+      border-radius: 8px;
+      color: var(--text-primary);
+      font-size: 0.82rem;
+      font-family: 'Inter', sans-serif;
+      cursor: pointer;
+    }
+    select:focus { outline: none; border-color: var(--accent-blue); }
+    /* ── Connection banner ──────────────────────────────────────── */
+    #conn-banner {
+      display: none;
+      position: fixed;
+      top: 64px; left: 0; right: 0;
+      z-index: 200;
+      background: rgba(248,81,73,0.15);
+      border-bottom: 1px solid rgba(248,81,73,0.3);
+      text-align: center;
+      padding: 10px;
+      font-size: 0.85rem;
+      color: var(--accent-red);
+    }
+    #conn-banner.show { display: block; }
+    /* ── Footer ─────────────────────────────────────────────────── */
+    footer {
+      position: relative; z-index: 1;
+      text-align: center;
+      padding: 1.5rem;
+      color: var(--text-dim);
+      font-size: 0.75rem;
+      border-top: 1px solid var(--border);
+    }
+  </style>
+</head>
+<body>
+<!-- Connection error banner -->
+<div id="conn-banner">
+  ⚡ Environment server unreachable — retrying...
+</div>
+<!-- ── Header ─── -->
+<header>
+  <div class="logo">
+    <div class="logo-icon">⚡</div>
+    <div class="logo-text">Grid<span>Mind</span>-RL</div>
+  </div>
+  <div class="header-status">
+    <span id="task-badge" class="task-badge">Task 1 — Cost Minimization</span>
+    <div style="display:flex;align-items:center;gap:6px">
+      <div class="status-dot" id="status-dot"></div>
+      <span class="status-label" id="status-label">Live</span>
+    </div>
+    <span id="ep-step" style="font-family:var(--font-mono);font-size:0.8rem;color:var(--text-secondary)">ep:— step:—</span>
+  </div>
+</header>
+<!-- ── KPI Bar ─── -->
+<div class="kpi-bar">
+  <div class="kpi">
+    <span class="kpi-label">Current Price</span>
+    <span class="kpi-value" id="kpi-price">—</span>
+    <span class="kpi-delta">$/kWh</span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Indoor Temp</span>
+    <span class="kpi-value" id="kpi-temp">—</span>
+    <span class="kpi-delta">°C (target 21°C)</span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Grid Stress</span>
+    <span class="kpi-value" id="kpi-stress">—</span>
+    <span class="kpi-delta">0=normal 1=critical</span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Cumulative Cost</span>
+    <span class="kpi-value" id="kpi-cost">—</span>
+    <span class="kpi-delta">vs baseline: <span id="kpi-baseline">—</span></span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Carbon Intensity</span>
+    <span class="kpi-value" id="kpi-carbon">—</span>
+    <span class="kpi-delta">gCO₂/kWh</span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Process Demand</span>
+    <span class="kpi-value" id="kpi-demand">—</span>
+    <span class="kpi-delta">kW</span>
+  </div>
+  <div class="kpi">
+    <span class="kpi-label">Thermal Storage</span>
+    <span class="kpi-value" id="kpi-storage">—</span>
+    <span class="kpi-delta">% capacity</span>
+  </div>
+</div>
+<!-- ── Main Content ─── -->
+<main>
+  <!-- Row 1: Price curve + Temperature + Controls -->
+  <div class="card col-8">
+    <div class="card-title"><span class="icon">💰</span> 24h Electricity Price Curve</div>
+    <div class="chart-wrap">
+      <canvas id="chart-price"></canvas>
+    </div>
+  </div>
+  <div class="card col-4" id="card-stress">
+    <div class="card-title"><span class="icon">⚠️</span> Grid Stress Signal</div>
+    <div class="big-num xl" id="stress-big">0.00</div>
+    <div class="sub-label">Demand-response urgency</div>
+    <div class="stress-meter" id="stress-meter"></div>
+    <div style="margin-top:0.75rem;">
+      <div class="chart-wrap short">
+        <canvas id="chart-stress"></canvas>
+      </div>
+    </div>
+  </div>
+  <!-- Row 2: Temperature + Storage + HVAC -->
+  <div class="card col-6">
+    <div class="card-title"><span class="icon">🌡️</span> Temperature Timeline</div>
+    <div class="chart-wrap tall">
+      <canvas id="chart-temp"></canvas>
+    </div>
+  </div>
+  <div class="card col-3">
+    <div class="card-title"><span class="icon">🔋</span> Thermal Storage Level</div>
+    <div class="storage-label"><span id="storage-pct">—</span><span>%</span></div>
+    <div class="storage-bar-wrap">
+      <div class="storage-bar-fill" id="storage-fill" style="width:0%"></div>
+    </div>
+    <div style="margin-top:1rem">
+      <div class="chart-wrap short">
+        <canvas id="chart-storage"></canvas>
+      </div>
+    </div>
+  </div>
+  <div class="card col-3">
+    <div class="card-title"><span class="icon">❄️</span> HVAC + Load Shed</div>
+    <div class="chart-wrap tall">
+      <canvas id="chart-hvac"></canvas>
+    </div>
+  </div>
+  <!-- Row 3: Cost comparison + Reward breakdown -->
+  <div class="card col-8">
+    <div class="card-title"><span class="icon">📊</span> Cumulative Cost vs Baseline</div>
+    <div class="chart-wrap tall">
+      <canvas id="chart-cost"></canvas>
+    </div>
+  </div>
+  <div class="card col-4">
+    <div class="card-title"><span class="icon">🏆</span> Reward Breakdown</div>
+    <div id="reward-rows" style="margin-top:0.5rem"></div>
+    <div style="margin-top:1rem">
+      <div class="chart-wrap short">
+        <canvas id="chart-reward"></canvas>
+      </div>
+    </div>
+  </div>
+  <!-- Row 4: Batch Gantt + Carbon -->
+  <div class="card col-6">
+    <div class="card-title"><span class="icon">⚙️</span> Batch Job Timeline</div>
+    <div class="gantt-wrap" id="gantt-wrap">
+      <div style="color:var(--text-dim);font-size:0.8rem">No batch jobs queued.</div>
+    </div>
+  </div>
+  <div class="card col-6">
+    <div class="card-title"><span class="icon">🌍</span> Carbon Intensity Curve (24h)</div>
+    <div class="chart-wrap">
+      <canvas id="chart-carbon"></canvas>
+    </div>
+  </div>
+  <!-- Row 5: Controls -->
+  <div class="card col-12">
+    <div class="card-title"><span class="icon">🎮</span> Episode Controls</div>
+    <div class="ctrl-row">
+      <select id="task-select" onchange="onTaskChange()">
+        <option value="1">Task 1 — Cost Minimization (Easy)</option>
+        <option value="2">Task 2 — Temperature Management (Medium)</option>
+        <option value="3">Task 3 — Full Demand Response (Hard)</option>
+      </select>
+      <select id="building-select" onchange="onBuildingChange()">
+        <option value="0">Building 1 (Primary)</option>
+        <option value="1">Building 2</option>
+        <option value="2">Building 3</option>
+      </select>
+      <button id="btn-reset" class="btn primary" onclick="doReset()">↺ New Episode</button>
+      <button class="btn" onclick="doGrade()">📋 Grade Episode</button>
+      <button class="btn" onclick="window.open('/api/replay')">📥 Export Replay</button>
+      <span id="grade-result" style="font-family:var(--font-mono);font-size:0.9rem;color:var(--accent-green)"></span>
+    </div>
+  </div>
+</main>
+<footer>
+  GridMind-RL &nbsp;·&nbsp; OpenEnv-compliant RL environment for industrial demand response &nbsp;·&nbsp;
+  <a href="/api/health" target="_blank" style="color:var(--accent-blue);text-decoration:none">API Health</a> &nbsp;·&nbsp;
+  <a href="/api/metrics" target="_blank" style="color:var(--accent-blue);text-decoration:none">Metrics</a>
+</footer>
+<script src="/static/dashboard.js"></script>
+</body>
+</html>

data/generate_prices.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Generate 30 days of realistic ISO New England-style hourly price data."""
+import json
+import math
+import random
+random.seed(12345)
+def generate_day(day_type: str = "weekday") -> list[float]:
+    """Generate 24 hourly prices for one day."""
+    prices = []
+    for hour in range(24):
+        # Base time-of-use price
+        if hour < 6:  # overnight off-peak
+            base = 0.035 + random.gauss(0, 0.005)
+        elif hour < 8:  # morning ramp
+            base = 0.07 + random.gauss(0, 0.008)
+        elif hour < 12:  # morning peak
+            base = 0.16 + random.gauss(0, 0.020) + (0.05 if day_type == "weekday" else 0)
+        elif hour < 14:  # midday moderate
+            base = 0.10 + random.gauss(0, 0.012)
+        elif hour < 17:  # afternoon
+            base = 0.12 + random.gauss(0, 0.015)
+        elif hour < 21:  # evening super-peak
+            base = 0.22 + random.gauss(0, 0.025) + (0.08 if day_type == "weekday" else 0.02)
+        elif hour < 23:  # evening wind-down
+            base = 0.09 + random.gauss(0, 0.010)
+        else:  # late night
+            base = 0.04 + random.gauss(0, 0.006)
+        # Seasonal variation (summer = higher peaks)
+        season_mult = 1.0
+        prices.append(round(max(0.02, base * season_mult), 4))
+    return prices
+# Generate 30 days: 22 weekdays + 8 weekend days
+days = []
+day_types = (["weekday"] * 5 + ["weekend"] * 2) * 5  # 5 weeks → 35 days → take 30
+for i, dt in enumerate(day_types[:30]):
+    day_data = {
+        "day": i + 1,
+        "day_type": dt,
+        "prices_usd_per_kwh": generate_day(dt),
+        "source": "synthetic_iso_new_england_style",
+        "region": "ISONE",
+        "currency": "USD",
+    }
+    days.append(day_data)
+output = {
+    "description": "30 days of synthetic ISO New England-style hourly electricity price data",
+    "units": "USD/kWh",
+    "hours_per_day": 24,
+    "days": days,
+}
+with open("price_curves.json", "w") as f:
+    json.dump(output, f, indent=2)
+print(f"Generated {len(days)} days of price data -> data/price_curves.json")

data/price_curves.json ADDED Viewed

	@@ -0,0 +1,997 @@

+{
+  "description": "30 days of synthetic ISO New England-style hourly electricity price data",
+  "units": "USD/kWh",
+  "hours_per_day": 24,
+  "days": [
+    {
+      "day": 1,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0344,
+        0.0354,
+        0.0369,
+        0.0313,
+        0.0328,
+        0.0374,
+        0.0657,
+        0.0681,
+        0.2251,
+        0.225,
+        0.1985,
+        0.2053,
+        0.0894,
+        0.0963,
+        0.1263,
+        0.1183,
+        0.1353,
+        0.2966,
+        0.2858,
+        0.2996,
+        0.2969,
+        0.0836,
+        0.0888,
+        0.0411
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 2,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.028,
+        0.0461,
+        0.0432,
+        0.0339,
+        0.0469,
+        0.0353,
+        0.0794,
+        0.06,
+        0.214,
+        0.2183,
+        0.1742,
+        0.2313,
+        0.0827,
+        0.0908,
+        0.122,
+        0.1382,
+        0.1129,
+        0.2548,
+        0.3064,
+        0.283,
+        0.3122,
+        0.085,
+        0.0941,
+        0.0328
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 3,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0376,
+        0.0411,
+        0.0225,
+        0.0336,
+        0.0379,
+        0.0358,
+        0.085,
+        0.0684,
+        0.2185,
+        0.2217,
+        0.1987,
+        0.2072,
+        0.0983,
+        0.1034,
+        0.1186,
+        0.1141,
+        0.1537,
+        0.2981,
+        0.3104,
+        0.2583,
+        0.2746,
+        0.1011,
+        0.0908,
+        0.0331
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 4,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0415,
+        0.0376,
+        0.0319,
+        0.0411,
+        0.0242,
+        0.0409,
+        0.0741,
+        0.0831,
+        0.2019,
+        0.1858,
+        0.2205,
+        0.2158,
+        0.1085,
+        0.0982,
+        0.1264,
+        0.1306,
+        0.1023,
+        0.293,
+        0.302,
+        0.3,
+        0.284,
+        0.0902,
+        0.076,
+        0.0333
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 5,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0399,
+        0.0384,
+        0.0277,
+        0.0399,
+        0.04,
+        0.0335,
+        0.0707,
+        0.0864,
+        0.2277,
+        0.2131,
+        0.2001,
+        0.2177,
+        0.0938,
+        0.0869,
+        0.1201,
+        0.1355,
+        0.1153,
+        0.2793,
+        0.3253,
+        0.3115,
+        0.3118,
+        0.1031,
+        0.0916,
+        0.0431
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 6,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0385,
+        0.0322,
+        0.0334,
+        0.041,
+        0.0324,
+        0.028,
+        0.0719,
+        0.0641,
+        0.1429,
+        0.1684,
+        0.134,
+        0.1573,
+        0.0969,
+        0.0967,
+        0.1037,
+        0.137,
+        0.1442,
+        0.2491,
+        0.2301,
+        0.2167,
+        0.2214,
+        0.0925,
+        0.0891,
+        0.0332
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 7,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0349,
+        0.0362,
+        0.0287,
+        0.0443,
+        0.0319,
+        0.0287,
+        0.0718,
+        0.0692,
+        0.1497,
+        0.1523,
+        0.1499,
+        0.1676,
+        0.0849,
+        0.121,
+        0.1155,
+        0.0913,
+        0.1313,
+        0.2497,
+        0.2415,
+        0.233,
+        0.2438,
+        0.0938,
+        0.093,
+        0.036
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 8,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0343,
+        0.0354,
+        0.0395,
+        0.0389,
+        0.0362,
+        0.0306,
+        0.069,
+        0.0721,
+        0.2192,
+        0.2166,
+        0.2171,
+        0.235,
+        0.0791,
+        0.1257,
+        0.1137,
+        0.1489,
+        0.1354,
+        0.3167,
+        0.2652,
+        0.2888,
+        0.2743,
+        0.0965,
+        0.091,
+        0.0366
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 9,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0285,
+        0.0337,
+        0.0356,
+        0.0321,
+        0.034,
+        0.0356,
+        0.0616,
+        0.0563,
+        0.1877,
+        0.2393,
+        0.208,
+        0.2069,
+        0.0964,
+        0.1007,
+        0.1096,
+        0.1448,
+        0.1337,
+        0.3229,
+        0.3113,
+        0.3042,
+        0.3108,
+        0.0887,
+        0.1009,
+        0.0526
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 10,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0337,
+        0.0383,
+        0.0398,
+        0.0383,
+        0.0383,
+        0.0363,
+        0.0723,
+        0.083,
+        0.224,
+        0.2327,
+        0.2133,
+        0.2737,
+        0.1272,
+        0.0984,
+        0.1354,
+        0.1088,
+        0.1394,
+        0.285,
+        0.3219,
+        0.3342,
+        0.3025,
+        0.0886,
+        0.0959,
+        0.0499
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 11,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0357,
+        0.0286,
+        0.0378,
+        0.0377,
+        0.0363,
+        0.049,
+        0.0712,
+        0.0776,
+        0.199,
+        0.1826,
+        0.1832,
+        0.199,
+        0.1176,
+        0.0801,
+        0.1229,
+        0.1223,
+        0.1482,
+        0.2716,
+        0.2829,
+        0.3186,
+        0.2896,
+        0.0804,
+        0.0958,
+        0.0429
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 12,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0329,
+        0.0411,
+        0.0388,
+        0.0421,
+        0.0398,
+        0.0341,
+        0.0597,
+        0.0651,
+        0.2184,
+        0.1831,
+        0.1782,
+        0.2067,
+        0.1281,
+        0.1042,
+        0.1386,
+        0.1063,
+        0.1263,
+        0.3334,
+        0.3248,
+        0.3044,
+        0.3069,
+        0.1025,
+        0.09,
+        0.0426
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 13,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.036,
+        0.0313,
+        0.0342,
+        0.0347,
+        0.0269,
+        0.0316,
+        0.0716,
+        0.063,
+        0.179,
+        0.1798,
+        0.1543,
+        0.1754,
+        0.103,
+        0.1116,
+        0.1137,
+        0.1062,
+        0.1053,
+        0.2079,
+        0.252,
+        0.2098,
+        0.2695,
+        0.0893,
+        0.0757,
+        0.038
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 14,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0427,
+        0.0336,
+        0.0375,
+        0.0335,
+        0.0281,
+        0.0387,
+        0.0684,
+        0.055,
+        0.1582,
+        0.1773,
+        0.143,
+        0.1704,
+        0.11,
+        0.1066,
+        0.1046,
+        0.1008,
+        0.1226,
+        0.2745,
+        0.2401,
+        0.2255,
+        0.2547,
+        0.0905,
+        0.0804,
+        0.028
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 15,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0317,
+        0.0378,
+        0.032,
+        0.0319,
+        0.0386,
+        0.0359,
+        0.0629,
+        0.0648,
+        0.1986,
+        0.2264,
+        0.2322,
+        0.2359,
+        0.0871,
+        0.1131,
+        0.0907,
+        0.1067,
+        0.1223,
+        0.2872,
+        0.2964,
+        0.3011,
+        0.3006,
+        0.0828,
+        0.0971,
+        0.0394
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 16,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0292,
+        0.0347,
+        0.0381,
+        0.0291,
+        0.0382,
+        0.032,
+        0.0742,
+        0.0729,
+        0.24,
+        0.219,
+        0.1969,
+        0.1791,
+        0.1012,
+        0.0939,
+        0.1402,
+        0.1286,
+        0.1336,
+        0.3002,
+        0.269,
+        0.281,
+        0.2543,
+        0.0738,
+        0.0985,
+        0.0392
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 17,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0346,
+        0.036,
+        0.0389,
+        0.0387,
+        0.0327,
+        0.0322,
+        0.0799,
+        0.0771,
+        0.1989,
+        0.1892,
+        0.2184,
+        0.2203,
+        0.0973,
+        0.1233,
+        0.1242,
+        0.1179,
+        0.1505,
+        0.2786,
+        0.3048,
+        0.2824,
+        0.3445,
+        0.0863,
+        0.0917,
+        0.0412
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 18,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0303,
+        0.0396,
+        0.0473,
+        0.0369,
+        0.0322,
+        0.0297,
+        0.0662,
+        0.0702,
+        0.2095,
+        0.1771,
+        0.2459,
+        0.2151,
+        0.083,
+        0.1221,
+        0.1143,
+        0.1287,
+        0.1245,
+        0.3212,
+        0.3134,
+        0.3064,
+        0.2688,
+        0.0922,
+        0.085,
+        0.0389
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 19,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0378,
+        0.0354,
+        0.0372,
+        0.0334,
+        0.0324,
+        0.0292,
+        0.0548,
+        0.0589,
+        0.2089,
+        0.2411,
+        0.1909,
+        0.2079,
+        0.0934,
+        0.1047,
+        0.1473,
+        0.14,
+        0.1142,
+        0.2882,
+        0.3052,
+        0.2849,
+        0.2826,
+        0.0787,
+        0.093,
+        0.0341
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 20,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0427,
+        0.0328,
+        0.0326,
+        0.0358,
+        0.0349,
+        0.0377,
+        0.0556,
+        0.0681,
+        0.1586,
+        0.1316,
+        0.1455,
+        0.1672,
+        0.1059,
+        0.0831,
+        0.1101,
+        0.1036,
+        0.1251,
+        0.2077,
+        0.238,
+        0.2641,
+        0.2324,
+        0.0977,
+        0.1027,
+        0.0344
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 21,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0309,
+        0.0314,
+        0.0327,
+        0.0352,
+        0.0284,
+        0.0361,
+        0.0644,
+        0.0777,
+        0.2042,
+        0.1598,
+        0.2028,
+        0.1825,
+        0.1061,
+        0.0789,
+        0.1235,
+        0.1103,
+        0.1345,
+        0.2664,
+        0.2455,
+        0.2298,
+        0.3003,
+        0.0922,
+        0.0966,
+        0.0375
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 22,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.037,
+        0.0313,
+        0.0354,
+        0.0343,
+        0.0322,
+        0.0328,
+        0.0688,
+        0.0686,
+        0.2045,
+        0.2027,
+        0.2029,
+        0.2126,
+        0.0948,
+        0.1047,
+        0.1121,
+        0.1349,
+        0.1085,
+        0.2702,
+        0.2516,
+        0.2928,
+        0.2882,
+        0.093,
+        0.0863,
+        0.0348
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 23,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0396,
+        0.033,
+        0.0396,
+        0.0354,
+        0.0405,
+        0.0336,
+        0.0734,
+        0.0631,
+        0.2038,
+        0.205,
+        0.2128,
+        0.219,
+        0.0913,
+        0.1061,
+        0.0997,
+        0.1123,
+        0.1207,
+        0.3197,
+        0.3126,
+        0.302,
+        0.2606,
+        0.1045,
+        0.0939,
+        0.0457
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 24,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0416,
+        0.0355,
+        0.0311,
+        0.0371,
+        0.0319,
+        0.0375,
+        0.0657,
+        0.0728,
+        0.1992,
+        0.1648,
+        0.2259,
+        0.1994,
+        0.1205,
+        0.1015,
+        0.1017,
+        0.1245,
+        0.1047,
+        0.2621,
+        0.2735,
+        0.297,
+        0.305,
+        0.0922,
+        0.0929,
+        0.043
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 25,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.041,
+        0.0326,
+        0.0393,
+        0.0358,
+        0.0269,
+        0.04,
+        0.0614,
+        0.0752,
+        0.2084,
+        0.2276,
+        0.2407,
+        0.2268,
+        0.0806,
+        0.1013,
+        0.1142,
+        0.1172,
+        0.1184,
+        0.3218,
+        0.3048,
+        0.2774,
+        0.3436,
+        0.0862,
+        0.0855,
+        0.0331
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 26,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0334,
+        0.0307,
+        0.0429,
+        0.0409,
+        0.0228,
+        0.0382,
+        0.0688,
+        0.0596,
+        0.1827,
+        0.2235,
+        0.2109,
+        0.2286,
+        0.0906,
+        0.1201,
+        0.1264,
+        0.1179,
+        0.1338,
+        0.3036,
+        0.289,
+        0.2755,
+        0.3041,
+        0.0638,
+        0.0826,
+        0.0323
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 27,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0391,
+        0.0402,
+        0.0327,
+        0.0326,
+        0.0424,
+        0.0362,
+        0.0815,
+        0.0695,
+        0.1723,
+        0.1985,
+        0.1525,
+        0.1767,
+        0.1083,
+        0.0803,
+        0.1483,
+        0.1069,
+        0.1308,
+        0.2403,
+        0.2243,
+        0.231,
+        0.2642,
+        0.0769,
+        0.0909,
+        0.0256
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 28,
+      "day_type": "weekend",
+      "prices_usd_per_kwh": [
+        0.0316,
+        0.0226,
+        0.0324,
+        0.0356,
+        0.0229,
+        0.0337,
+        0.0647,
+        0.0654,
+        0.1687,
+        0.1662,
+        0.1826,
+        0.1819,
+        0.0939,
+        0.1113,
+        0.1333,
+        0.1179,
+        0.1121,
+        0.221,
+        0.2202,
+        0.2574,
+        0.2283,
+        0.0917,
+        0.082,
+        0.0341
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 29,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0453,
+        0.0437,
+        0.039,
+        0.0347,
+        0.0303,
+        0.0423,
+        0.0597,
+        0.0696,
+        0.213,
+        0.1768,
+        0.2078,
+        0.233,
+        0.1092,
+        0.0769,
+        0.1379,
+        0.0999,
+        0.1181,
+        0.2952,
+        0.2739,
+        0.3293,
+        0.3004,
+        0.0936,
+        0.0758,
+        0.0343
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    },
+    {
+      "day": 30,
+      "day_type": "weekday",
+      "prices_usd_per_kwh": [
+        0.0431,
+        0.0348,
+        0.0338,
+        0.0354,
+        0.0305,
+        0.0254,
+        0.0598,
+        0.0688,
+        0.2247,
+        0.2196,
+        0.1798,
+        0.1903,
+        0.0898,
+        0.1202,
+        0.1038,
+        0.11,
+        0.127,
+        0.3367,
+        0.317,
+        0.3198,
+        0.338,
+        0.0768,
+        0.0962,
+        0.0387
+      ],
+      "source": "synthetic_iso_new_england_style",
+      "region": "ISONE",
+      "currency": "USD"
+    }
+  ]
+}

env/environment.go ADDED Viewed

	@@ -0,0 +1,679 @@

+// Package env implements the GridMind-RL simulation core.
+// It models a multi-building industrial/commercial energy management system
+// with stochastic electricity prices, thermal dynamics, and batch job scheduling.
+package env
+import (
+	"math"
+	"math/rand"
+	"sync"
+	"time"
+)
+const (
+	EpisodeSteps     = 96    // 24 hours × 15-min intervals
+	StepDurationHrs  = 0.25  // each step = 15 minutes = 0.25 h
+	MaxBuildings     = 3
+	DefaultSetpoint  = 21.0  // °C comfortable indoor temp
+	TMinDefault      = 19.0  // °C lower bound
+	TMaxDefault      = 23.0  // °C upper bound
+	MaxHVACPowerKW   = 50.0  // kW per building
+	MaxStorageKWh    = 100.0 // kWh thermal storage capacity
+	StorageLossRate  = 0.005 // fraction lost per step (thermal dissipation)
+	MaxBatchJobs     = 5     // max concurrent batch jobs per building
+)
+// Environment is the thread-safe top-level simulation manager.
+type Environment struct {
+	mu           sync.RWMutex
+	rng          *rand.Rand
+	seed         int64
+	episode      int
+	step         int
+	done         bool
+	taskID       int
+	difficulty   string
+	numBuildings int
+	Buildings    []*BuildingState
+	PriceCurve   [EpisodeSteps]float64 // $/kWh for each step
+	CarbonCurve  [EpisodeSteps]float64 // gCO2/kWh for each step
+	Replay       []ReplayEntry
+	LastActions  []ActionModel
+	// History for dashboard rendering (per building)
+	TempHistory     [][]float64
+	CostHistory     [][]float64
+	HVACHistory     [][]float64
+	LoadShedHistory [][]float64
+	RewardHistory   [][]RewardComponents
+	// Exploit detection counters
+	totalShedSteps       []int    // steps where load_shed > 0.4
+	thermalCycleCounts   []int    // rapid thermal storage reversals
+	prevChargeRates      []float64
+}
+// NewEnvironment creates an initialised (but not reset) environment.
+func NewEnvironment() *Environment {
+	seed := time.Now().UnixNano()
+	return &Environment{
+		rng:          rand.New(rand.NewSource(seed)),
+		seed:         seed,
+		taskID:       1,
+		difficulty:   "easy",
+		numBuildings: 1,
+	}
+}
+// Reset initializes a new episode. Thread-safe.
+func (e *Environment) Reset(req ResetRequest) ResetResponse {
+	e.mu.Lock()
+	defer e.mu.Unlock()
+	// Apply seed
+	if req.Seed != nil {
+		e.seed = *req.Seed
+	} else {
+		e.seed = time.Now().UnixNano()
+	}
+	e.rng = rand.New(rand.NewSource(e.seed))
+	// Apply task and difficulty
+	e.taskID = req.TaskID
+	if e.taskID < 1 || e.taskID > 3 {
+		e.taskID = 1
+	}
+	e.difficulty = req.Difficulty
+	if e.difficulty == "" {
+		switch e.taskID {
+		case 1:
+			e.difficulty = "easy"
+		case 2:
+			e.difficulty = "medium"
+		case 3:
+			e.difficulty = "hard"
+		}
+	}
+	// Number of buildings (federation)
+	e.numBuildings = req.NumBuildings
+	if e.numBuildings < 1 {
+		e.numBuildings = 1
+	}
+	if e.numBuildings > MaxBuildings {
+		e.numBuildings = MaxBuildings
+	}
+	e.episode++
+	e.step = 0
+	e.done = false
+	e.Replay = make([]ReplayEntry, 0, EpisodeSteps)
+	e.LastActions = make([]ActionModel, e.numBuildings)
+	// Generate price and carbon curves for this episode
+	e.generatePriceCurve()
+	e.generateCarbonCurve()
+	// Initialise buildings
+	e.Buildings = make([]*BuildingState, e.numBuildings)
+	e.TempHistory = make([][]float64, e.numBuildings)
+	e.CostHistory = make([][]float64, e.numBuildings)
+	e.HVACHistory = make([][]float64, e.numBuildings)
+	e.LoadShedHistory = make([][]float64, e.numBuildings)
+	e.RewardHistory = make([][]RewardComponents, e.numBuildings)
+	e.totalShedSteps = make([]int, e.numBuildings)
+	e.thermalCycleCounts = make([]int, e.numBuildings)
+	e.prevChargeRates = make([]float64, e.numBuildings)
+	for i := 0; i < e.numBuildings; i++ {
+		e.Buildings[i] = e.newBuildingState(i)
+		e.TempHistory[i] = make([]float64, 0, EpisodeSteps)
+		e.CostHistory[i] = make([]float64, 0, EpisodeSteps)
+		e.HVACHistory[i] = make([]float64, 0, EpisodeSteps)
+		e.LoadShedHistory[i] = make([]float64, 0, EpisodeSteps)
+		e.RewardHistory[i] = make([]RewardComponents, 0, EpisodeSteps)
+	}
+	obs := make([]ObservationModel, e.numBuildings)
+	for i, b := range e.Buildings {
+		obs[i] = e.buildObservation(b)
+	}
+	return ResetResponse{
+		Observations: obs,
+		Episode:      e.episode,
+		TaskID:       e.taskID,
+		Seed:         e.seed,
+	}
+}
+// Step advances the simulation by one timestep for all buildings. Thread-safe.
+func (e *Environment) Step(actions []ActionModel) ([]StepResponse, bool) {
+	e.mu.Lock()
+	defer e.mu.Unlock()
+	if e.done {
+		return nil, true
+	}
+	// Validate and clamp actions
+	for i := range actions {
+		e.clampAction(&actions[i])
+		if i < e.numBuildings {
+			e.LastActions[i] = actions[i]
+		}
+	}
+	responses := make([]StepResponse, e.numBuildings)
+	for i, b := range e.Buildings {
+		var act ActionModel
+		// Find action for this building (by building_id or by index)
+		act = e.findAction(actions, i)
+		responses[i] = e.stepBuilding(b, act, i)
+	}
+	e.step++
+	if e.step >= EpisodeSteps {
+		e.done = true
+	}
+	// Record replay entry (aggregate of all buildings, first building primary)
+	if len(responses) > 0 {
+		entry := ReplayEntry{
+			Step:        e.step - 1,
+			Observation: responses[0].Observation,
+			Action:      e.LastActions[0],
+			Reward:      responses[0].Reward,
+			Components:  responses[0].Info.RewardComponents,
+			Done:        e.done,
+		}
+		e.Replay = append(e.Replay, entry)
+	}
+	return responses, e.done
+}
+// GetState returns a full snapshot of environment state. Thread-safe (read lock).
+func (e *Environment) GetState() StateResponse {
+	e.mu.RLock()
+	defer e.mu.RUnlock()
+	buildings := make([]BuildingStatePublic, e.numBuildings)
+	for i, b := range e.Buildings {
+		pub := BuildingStatePublic{
+			ObservationModel:    e.buildObservation(b),
+			OutdoorTemperature:  b.OutdoorTemperature,
+			SetpointTemperature: b.SetpointTemperature,
+			BaselineCost:        b.BaselineCost,
+			CumulativeCarbon:    b.CumulativeCarbon,
+			Jobs:                b.Jobs,
+		}
+		if i < len(e.TempHistory) {
+			pub.TempHistory = e.TempHistory[i]
+			pub.CostHistory = e.CostHistory[i]
+			pub.HVACHistory = e.HVACHistory[i]
+			pub.LoadShedHistory = e.LoadShedHistory[i]
+			pub.RewardHistory = e.RewardHistory[i]
+		}
+		buildings[i] = pub
+	}
+	priceCurve := make([]float64, 24)
+	carbonCurve := make([]float64, 24)
+	for h := 0; h < 24; h++ {
+		stepIdx := h * 4
+		if stepIdx < EpisodeSteps {
+			priceCurve[h] = e.PriceCurve[stepIdx]
+			carbonCurve[h] = e.CarbonCurve[stepIdx]
+		}
+	}
+	return StateResponse{
+		Buildings:   buildings,
+		PriceCurve:  priceCurve,
+		CarbonCurve: carbonCurve,
+		Episode:     e.episode,
+		Step:        e.step,
+		TaskID:      e.taskID,
+		Done:        e.done,
+		Seed:        e.seed,
+	}
+}
+// GetReplay returns the full episode replay. Thread-safe.
+func (e *Environment) GetReplay() []ReplayEntry {
+	e.mu.RLock()
+	defer e.mu.RUnlock()
+	result := make([]ReplayEntry, len(e.Replay))
+	copy(result, e.Replay)
+	return result
+}
+// ──────────────────────────────────────────────
+// Internal helpers
+// ──────────────────────────────────────────────
+func (e *Environment) newBuildingState(id int) *BuildingState {
+	// Randomise initial conditions slightly
+	initTemp := DefaultSetpoint + (e.rng.Float64()-0.5)*2.0
+	storageLevel := 0.3 + e.rng.Float64()*0.4 // start 30–70% full
+	outdoorTemp := 15.0 + e.rng.Float64()*15.0 // 15–30 °C
+	b := &BuildingState{
+		BuildingID:          id,
+		IndoorTemperature:   initTemp,
+		ThermalStorageLevel: storageLevel,
+		ProcessDemand:       10.0 + e.rng.Float64()*20.0,
+		CurrentPrice:        e.PriceCurve[0],
+		GridStressSignal:    0.0,
+		CarbonIntensity:     e.CarbonCurve[0],
+		HourOfDay:           0,
+		Step:                0,
+		BatchQueue:          []int{},
+		CumulativeCost:      0.0,
+		CumulativeCarbon:    0.0,
+		OutdoorTemperature:  outdoorTemp,
+		PrevHVACLevel:       0.5,
+		BaselineCost:        0.0,
+		SetpointTemperature: DefaultSetpoint,
+		MaxHVACPower:        MaxHVACPowerKW,
+		MaxStorageCapacity:  MaxStorageKWh,
+		ThermalLossRate:     StorageLossRate,
+	}
+	// Spawn batch jobs based on difficulty
+	b.Jobs = e.generateBatchJobs()
+	b.BatchQueue = pendingDeadlines(b.Jobs)
+	return b
+}
+func (e *Environment) generateBatchJobs() []BatchJob {
+	numJobs := 3
+	switch e.difficulty {
+	case "medium":
+		numJobs = 4
+	case "hard":
+		numJobs = 5
+	}
+	jobs := make([]BatchJob, numJobs)
+	for i := range jobs {
+		// Deadline spread across episode, ensuring feasibility
+		deadline := 20 + e.rng.Intn(60)
+		jobs[i] = BatchJob{
+			ID:           i + 1,
+			DeadlineSlot: deadline,
+			Duration:     1 + e.rng.Intn(3),
+			PowerDraw:    5.0 + e.rng.Float64()*15.0,
+			Scheduled:    false,
+			ScheduledAt:  -1,
+			Completed:    false,
+			MissedDeadline: false,
+		}
+	}
+	return jobs
+}
+// generatePriceCurve creates a stochastic Time-of-Use price curve for the episode.
+func (e *Environment) generatePriceCurve() {
+	// Base ToU: low overnight, moderate morning, high peak (8-12, 17-21), low night
+	volatility := 0.1
+	switch e.difficulty {
+	case "medium":
+		volatility = 0.2
+	case "hard":
+		volatility = 0.35
+	}
+	// Random peak window shift (±2 hours) for stochasticity
+	morningPeakShift := e.rng.Intn(5) - 2
+	eveningPeakShift := e.rng.Intn(5) - 2
+	for s := 0; s < EpisodeSteps; s++ {
+		hour := (s / 4)
+		base := touPrice(hour, morningPeakShift, eveningPeakShift)
+		noise := (e.rng.Float64()*2 - 1) * volatility * base
+		price := math.Max(0.02, base+noise)
+		e.PriceCurve[s] = price
+	}
+}
+// touPrice returns the base time-of-use price for a given hour.
+func touPrice(hour, morningShift, eveningShift int) float64 {
+	// Off-peak: 0.04 $/kWh, on-peak: 0.18 $/kWh, extreme peak: 0.32 $/kWh
+	morningPeakStart := 8 + morningShift
+	morningPeakEnd := 12 + morningShift
+	eveningPeakStart := 17 + eveningShift
+	eveningPeakEnd := 21 + eveningShift
+	switch {
+	case hour >= morningPeakStart && hour < morningPeakEnd:
+		return 0.18
+	case hour >= eveningPeakStart && hour <= eveningPeakEnd:
+		return 0.22
+	case (hour >= 9 && hour < morningPeakStart) || (hour >= morningPeakEnd && hour < eveningPeakStart):
+		return 0.10
+	case hour >= 23 || hour < 6:
+		return 0.04
+	default:
+		return 0.08
+	}
+}
+// generateCarbonCurve creates a realistic carbon intensity curve (gCO2/kWh).
+// Correlates roughly with price: higher price = more peaker plants = higher carbon.
+func (e *Environment) generateCarbonCurve() {
+	for s := 0; s < EpisodeSteps; s++ {
+		price := e.PriceCurve[s]
+		// Map price range [0.04, 0.32] → carbon [150, 600] gCO2/kWh
+		carbon := 150.0 + (price-0.04)/(0.32-0.04)*(600.0-150.0)
+		noise := (e.rng.Float64()*2 - 1) * 30.0
+		e.CarbonCurve[s] = math.Max(100.0, carbon+noise)
+	}
+}
+// stepBuilding advances a single building by one timestep.
+func (e *Environment) stepBuilding(b *BuildingState, act ActionModel, idx int) StepResponse {
+	s := e.step
+	// Update environmental signals from curves
+	b.CurrentPrice = e.PriceCurve[s]
+	b.CarbonIntensity = e.CarbonCurve[s]
+	b.HourOfDay = (s / 4) % 24
+	// Stochastic grid stress events (more frequent in hard mode)
+	b.GridStressSignal = e.updateGridStress(b, s)
+	// Weather perturbation: outdoor temp drifts sinusoidally + noise
+	b.OutdoorTemperature = e.updateOutdoorTemp(b, s)
+	// Process demand fluctuation
+	b.ProcessDemand = e.updateProcessDemand(b, s)
+	// ----- Apply actions -----
+	// 1. HVAC: heats/cools building toward setpoint
+	hvacPower := act.HVACPowerLevel * b.MaxHVACPower // kW
+	// 2. Thermal storage: charge or discharge
+	chargeKW := act.ThermalChargeRate * b.MaxHVACPower * 0.3 // max 30% of HVAC for storage
+	newStorageEnergy := b.ThermalStorageLevel*b.MaxStorageCapacity + chargeKW*StepDurationHrs
+	// Apply thermal losses
+	newStorageEnergy *= (1.0 - b.ThermalLossRate)
+	newStorageEnergy = math.Max(0, math.Min(b.MaxStorageCapacity, newStorageEnergy))
+	b.ThermalStorageLevel = newStorageEnergy / b.MaxStorageCapacity
+	// 3. Load shedding
+	clampedShed := math.Max(0, math.Min(0.5, act.LoadShedFraction))
+	shedKW := clampedShed * b.ProcessDemand
+	// 4. Batch job scheduling
+	batchCompleted, batchMissed := e.updateBatchJobs(b, act.BatchJobSlot, s)
+	// ----- Thermal dynamics -----
+	// Simple first-order thermal model:
+	// ΔT per step = (HVAC effect + outdoor infiltration + storage discharge effect - process demand)
+	hvacEffect := (act.HVACPowerLevel - 0.5) * 2.0 * 1.5 // ±3°C max swing per step
+	infiltration := (b.OutdoorTemperature - b.IndoorTemperature) * 0.03
+	storageEffect := 0.0
+	if act.ThermalChargeRate < 0 { // discharging storage = provides cooling/heating
+		storageEffect = math.Abs(act.ThermalChargeRate) * 0.5
+	}
+	processHeat := b.ProcessDemand * 0.002 // kW→°C rough factor
+	deltaT := hvacEffect + infiltration + storageEffect - processHeat
+	b.IndoorTemperature += deltaT
+	// ----- Energy & cost accounting -----
+	batchPowerDraw := e.batchRunningPower(b)
+	totalKW := hvacPower + math.Max(0, chargeKW) + batchPowerDraw - shedKW
+	totalKW = math.Max(0, totalKW)
+	energyKWh := totalKW * StepDurationHrs
+	stepCost := energyKWh * b.CurrentPrice
+	stepCarbon := energyKWh * b.CarbonIntensity
+	b.CumulativeCost += stepCost
+	b.CumulativeCarbon += stepCarbon
+	// Baseline (always-on at 70% HVAC, no storage/shedding)
+	baselineKW := 0.7*b.MaxHVACPower + b.ProcessDemand
+	baselineEnergy := baselineKW * StepDurationHrs
+	b.BaselineCost += baselineEnergy * b.CurrentPrice
+	// ----- Reward computation -----
+	rc := ComputeReward(ComputeRewardInput{
+		B:               b,
+		Act:             act,
+		StepCost:        stepCost,
+		EnergyKWh:       energyKWh,
+		TMin:            TMinDefault,
+		TMax:            TMaxDefault,
+		StepCarbon:      stepCarbon,
+		BatchMissed:     len(batchMissed),
+		GridStress:      b.GridStressSignal,
+		ShedFraction:    clampedShed,
+		TaskID:          e.taskID,
+		PrevHVACLevel:   b.PrevHVACLevel,
+		ChargeRate:      act.ThermalChargeRate,
+		PrevChargeRate:  e.prevChargeRates[idx],
+		StorageDelta:    act.ThermalChargeRate,
+		PriceCurve:      e.PriceCurve[:],
+		CurrentStep:     s,
+	})
+	b.PrevHVACLevel = act.HVACPowerLevel
+	e.prevChargeRates[idx] = act.ThermalChargeRate
+	// Update batch queue
+	b.BatchQueue = pendingDeadlines(b.Jobs)
+	// Exploit detection
+	if clampedShed > 0.4 {
+		e.totalShedSteps[idx]++
+	}
+	if len(e.thermalCycleCounts) > idx {
+		if len(e.Replay) > 0 {
+			prev := e.prevChargeRates[idx]
+			if prev > 0.3 && act.ThermalChargeRate < -0.3 || prev < -0.3 && act.ThermalChargeRate > 0.3 {
+				e.thermalCycleCounts[idx]++
+			}
+		}
+	}
+	// Record history
+	if idx < len(e.TempHistory) {
+		e.TempHistory[idx] = append(e.TempHistory[idx], b.IndoorTemperature)
+		e.CostHistory[idx] = append(e.CostHistory[idx], b.CumulativeCost)
+		e.HVACHistory[idx] = append(e.HVACHistory[idx], act.HVACPowerLevel)
+		e.LoadShedHistory[idx] = append(e.LoadShedHistory[idx], clampedShed)
+		e.RewardHistory[idx] = append(e.RewardHistory[idx], rc)
+	}
+	obs := e.buildObservation(b)
+	return StepResponse{
+		Observation: obs,
+		Reward:      rc.Total,
+		Done:        e.done || s+1 >= EpisodeSteps,
+		Info: StepInfo{
+			RewardComponents: rc,
+			EnergyUsed:       energyKWh,
+			CarbonEmitted:    stepCarbon,
+			PriceSignal:      b.CurrentPrice,
+			GridStress:       b.GridStressSignal,
+			BatchCompleted:   batchCompleted,
+			BatchMissed:      batchMissed,
+			Episode:          e.episode,
+			Step:             s,
+		},
+	}
+}
+func (e *Environment) updateGridStress(b *BuildingState, s int) float64 {
+	// Grid stress is elevated during price peaks and stochastic demand spikes
+	price := e.PriceCurve[s]
+	priceNorm := (price - 0.04) / (0.32 - 0.04)
+	// Random stress events
+	stressProb := 0.05
+	switch e.difficulty {
+	case "medium":
+		stressProb = 0.1
+	case "hard":
+		stressProb = 0.2
+	}
+	spike := 0.0
+	if e.rng.Float64() < stressProb {
+		spike = 0.3 + e.rng.Float64()*0.5
+	}
+	stress := math.Min(1.0, priceNorm*0.6+spike)
+	return math.Max(0, stress)
+}
+func (e *Environment) updateOutdoorTemp(b *BuildingState, s int) float64 {
+	// Sinusoidal daily temperature cycle + noise
+	hour := float64(s) / 4.0
+	baseTemp := 15.0 + 8.0*math.Sin(2*math.Pi*(hour-6)/24.0)
+	noise := (e.rng.Float64()*2 - 1) * 1.5
+	return baseTemp + noise
+}
+func (e *Environment) updateProcessDemand(b *BuildingState, s int) float64 {
+	// Process demand shifts with business hours
+	hour := s / 4
+	base := 10.0
+	if hour >= 8 && hour <= 18 {
+		base = 20.0 + 10.0*math.Sin(math.Pi*float64(hour-8)/10.0)
+	}
+	noise := (e.rng.Float64()*2 - 1) * 3.0
+	return math.Max(0, base+noise)
+}
+func (e *Environment) updateBatchJobs(b *BuildingState, slot int, step int) (completed []int, missed []int) {
+	completed = []int{}
+	missed = []int{}
+	// Schedule the first pending job into the chosen slot
+	for i := range b.Jobs {
+		job := &b.Jobs[i]
+		if !job.Scheduled && !job.Completed && !job.MissedDeadline {
+			schedAt := step + slot
+			job.Scheduled = true
+			job.ScheduledAt = schedAt
+			break // only schedule one job per step
+		}
+	}
+	// Advance running or completed jobs
+	for i := range b.Jobs {
+		job := &b.Jobs[i]
+		if job.Completed || job.MissedDeadline {
+			continue
+		}
+		// Check deadline miss
+		if step >= job.DeadlineSlot && !job.Completed {
+			job.MissedDeadline = true
+			missed = append(missed, job.ID)
+			continue
+		}
+		// Mark as completed if scheduled and past its start
+		if job.Scheduled && step >= job.ScheduledAt {
+			if step >= job.ScheduledAt+job.Duration-1 {
+				job.Completed = true
+				completed = append(completed, job.ID)
+			}
+		}
+	}
+	return
+}
+func (e *Environment) batchRunningPower(b *BuildingState) float64 {
+	total := 0.0
+	for _, job := range b.Jobs {
+		if job.Scheduled && !job.Completed && !job.MissedDeadline {
+			if e.step >= job.ScheduledAt && e.step < job.ScheduledAt+job.Duration {
+				total += job.PowerDraw
+			}
+		}
+	}
+	return total
+}
+func (e *Environment) buildObservation(b *BuildingState) ObservationModel {
+	return ObservationModel{
+		IndoorTemperature:   math.Round(b.IndoorTemperature*100) / 100,
+		ThermalStorageLevel: math.Round(b.ThermalStorageLevel*1000) / 1000,
+		ProcessDemand:       math.Round(b.ProcessDemand*100) / 100,
+		CurrentPrice:        math.Round(b.CurrentPrice*10000) / 10000,
+		GridStressSignal:    math.Round(b.GridStressSignal*1000) / 1000,
+		CarbonIntensity:     math.Round(b.CarbonIntensity*10) / 10,
+		HourOfDay:           b.HourOfDay,
+		BatchQueue:          pendingDeadlines(b.Jobs),
+		CumulativeCost:      math.Round(b.CumulativeCost*10000) / 10000,
+		Step:                b.Step,
+		BuildingID:          b.BuildingID,
+	}
+}
+func (e *Environment) clampAction(a *ActionModel) {
+	a.HVACPowerLevel = math.Max(0, math.Min(1.0, a.HVACPowerLevel))
+	a.ThermalChargeRate = math.Max(-1.0, math.Min(1.0, a.ThermalChargeRate))
+	a.BatchJobSlot = max(0, min(4, a.BatchJobSlot))
+	a.LoadShedFraction = math.Max(0, math.Min(0.5, a.LoadShedFraction))
+}
+func (e *Environment) findAction(actions []ActionModel, buildingIdx int) ActionModel {
+	// Try to find an action with matching building_id, else use positional
+	for _, a := range actions {
+		if a.BuildingID == buildingIdx {
+			return a
+		}
+	}
+	if buildingIdx < len(actions) {
+		return actions[buildingIdx]
+	}
+	// Default: do-nothing action
+	return ActionModel{HVACPowerLevel: 0.5, ThermalChargeRate: 0.0, BatchJobSlot: 0, LoadShedFraction: 0.0}
+}
+// pendingDeadlines returns a slice of deadline slots for all incomplete, unscheduled jobs.
+func pendingDeadlines(jobs []BatchJob) []int {
+	result := []int{}
+	for _, j := range jobs {
+		if !j.Completed && !j.MissedDeadline {
+			result = append(result, j.DeadlineSlot)
+		}
+	}
+	return result
+}
+func max(a, b int) int {
+	if a > b {
+		return a
+	}
+	return b
+}
+func min(a, b int) int {
+	if a < b {
+		return a
+	}
+	return b
+}
+// ExploitDetected returns whether the current episode shows signs of degenerate strategies.
+func (e *Environment) ExploitDetected(buildingIdx int) (bool, float64) {
+	e.mu.RLock()
+	defer e.mu.RUnlock()
+	if buildingIdx >= len(e.totalShedSteps) {
+		return false, 0.0
+	}
+	// Flag if agent always sheds > 40% load (more than 70% of steps)
+	shedRatio := float64(e.totalShedSteps[buildingIdx]) / float64(e.step+1)
+	cycleRatio := float64(e.thermalCycleCounts[buildingIdx]) / float64(e.step+1)
+	exploited := shedRatio > 0.7 || cycleRatio > 0.4
+	penalty := 0.0
+	if exploited {
+		penalty = math.Max(shedRatio-0.7, 0)*0.5 + math.Max(cycleRatio-0.4, 0)*0.3
+	}
+	return exploited, penalty
+}

env/models.go ADDED Viewed

	@@ -0,0 +1,170 @@

+// Package env defines all data models for the GridMind-RL environment.
+package env
+// BatchJob represents a pending industrial/commercial batch process job.
+type BatchJob struct {
+	ID           int `json:"id"`           // Unique job identifier
+	DeadlineSlot int `json:"deadline_slot"` // Latest timestep by which the job must run
+	Duration     int `json:"duration"`      // Steps the job takes to complete
+	PowerDraw    float64 `json:"power_draw"` // kW drawn when job is running
+	Scheduled    bool    `json:"scheduled"`  // Whether a time slot has been assigned
+	ScheduledAt  int     `json:"scheduled_at"` // Timestep at which job was scheduled (-1 if not yet)
+	Completed    bool    `json:"completed"`  // Whether the job finished execution
+	MissedDeadline bool  `json:"missed_deadline"` // True if deadline was exceeded
+}
+// BuildingState holds the full simulation state for a single building.
+type BuildingState struct {
+	// Core physical state
+	IndoorTemperature    float64 `json:"indoor_temperature"`     // °C
+	ThermalStorageLevel  float64 `json:"thermal_storage_level"`  // 0.0–1.0 normalized
+	ProcessDemand        float64 `json:"process_demand"`          // kW current process need
+	// Market & grid signals
+	CurrentPrice         float64 `json:"current_price"`          // $/kWh
+	GridStressSignal     float64 `json:"grid_stress_signal"`     // 0.0–1.0 (DR urgency)
+	CarbonIntensity      float64 `json:"carbon_intensity"`       // gCO2/kWh
+	// Temporal
+	HourOfDay            int     `json:"hour_of_day"`            // 0–23
+	Step                 int     `json:"step"`                   // 0–95 within episode
+	// Batch job queue: pending deadlines (raw slots)
+	BatchQueue           []int   `json:"batch_queue"`            // deadline slots of pending jobs
+	// Running cost tracker
+	CumulativeCost       float64 `json:"cumulative_cost"`        // $ total this episode
+	CumulativeCarbon     float64 `json:"cumulative_carbon"`      // gCO2 total this episode
+	// Internal tracking (not exposed in observation)
+	Jobs                 []BatchJob `json:"-"`
+	OutdoorTemperature   float64    `json:"-"` // °C for weather perturbation
+	PrevHVACLevel        float64    `json:"-"` // for stability penalty
+	BaselineCost         float64    `json:"-"` // always-on policy running cost
+	SetpointTemperature  float64    `json:"-"` // target indoor temp (°C)
+	MaxHVACPower         float64    `json:"-"` // kW
+	MaxStorageCapacity   float64    `json:"-"` // kWh
+	ThermalLossRate      float64    `json:"-"` // fraction lost per step
+	BuildingID           int        `json:"-"` // which building in federation
+}
+// ObservationModel is the JSON-serializable observation returned on each step/state.
+type ObservationModel struct {
+	IndoorTemperature   float64 `json:"indoor_temperature"`
+	ThermalStorageLevel float64 `json:"thermal_storage_level"`
+	ProcessDemand       float64 `json:"process_demand"`
+	CurrentPrice        float64 `json:"current_price"`
+	GridStressSignal    float64 `json:"grid_stress_signal"`
+	CarbonIntensity     float64 `json:"carbon_intensity"`
+	HourOfDay           int     `json:"hour_of_day"`
+	BatchQueue          []int   `json:"batch_queue"`
+	CumulativeCost      float64 `json:"cumulative_cost"`
+	Step                int     `json:"step"`
+	BuildingID          int     `json:"building_id"`
+}
+// ActionModel is the parsed agent action for a single step.
+type ActionModel struct {
+	HVACPowerLevel     float64 `json:"hvac_power_level"`    // 0.0–1.0
+	ThermalChargeRate  float64 `json:"thermal_charge_rate"` // -1.0 to 1.0
+	BatchJobSlot       int     `json:"batch_job_slot"`      // 0–4 (0=now, 1–4=defer)
+	LoadShedFraction   float64 `json:"load_shed_fraction"`  // 0.0–0.5
+	BuildingID         int     `json:"building_id"`         // which building to act on
+}
+// RewardComponents holds the individual components of the dense reward signal.
+type RewardComponents struct {
+	CostSavings      float64 `json:"cost_savings"`       // negative = expensive
+	TempConstraint   float64 `json:"temp_constraint"`    // positive = within bounds
+	GridResponse     float64 `json:"grid_response"`      // bonus for DR compliance
+	DeadlinePenalty  float64 `json:"deadline_penalty"`   // negative for missed jobs
+	EfficiencyBonus  float64 `json:"efficiency_bonus"`   // storage arbitrage
+	StabilityPenalty float64 `json:"stability_penalty"`  // HVAC oscillation penalty
+	CarbonReward     float64 `json:"carbon_reward"`      // low-carbon bonus
+	Total            float64 `json:"total"`
+}
+// StepResponse is the full HTTP body returned from POST /step.
+type StepResponse struct {
+	Observation ObservationModel `json:"observation"`
+	Reward      float64          `json:"reward"`
+	Done        bool             `json:"done"`
+	Info        StepInfo         `json:"info"`
+}
+// StepInfo carries auxiliary information per step.
+type StepInfo struct {
+	RewardComponents RewardComponents `json:"reward_components"`
+	EnergyUsed       float64          `json:"energy_used_kwh"`
+	CarbonEmitted    float64          `json:"carbon_emitted_gco2"`
+	PriceSignal      float64          `json:"price_signal"`
+	GridStress       float64          `json:"grid_stress"`
+	BatchCompleted   []int            `json:"batch_completed"`   // IDs completed this step
+	BatchMissed      []int            `json:"batch_missed"`      // IDs that missed deadline
+	Episode          int              `json:"episode"`
+	Step             int              `json:"step"`
+}
+// ResetRequest is the JSON body for POST /reset.
+type ResetRequest struct {
+	Seed       *int64 `json:"seed,omitempty"`       // optional random seed
+	TaskID     int    `json:"task_id"`              // 1, 2, or 3
+	Difficulty string `json:"difficulty,omitempty"` // "easy", "medium", "hard" or "" (auto)
+	NumBuildings int  `json:"num_buildings,omitempty"` // 1–3 for federation
+}
+// ResetResponse is returned from POST /reset.
+type ResetResponse struct {
+	Observations []ObservationModel `json:"observations"` // one per building
+	Episode      int                `json:"episode"`
+	TaskID       int                `json:"task_id"`
+	Seed         int64              `json:"seed"`
+}
+// StateResponse is returned from GET /state.
+type StateResponse struct {
+	Buildings    []BuildingStatePublic `json:"buildings"`
+	PriceCurve   []float64            `json:"price_curve_24h"`    // full 24h ToU prices
+	CarbonCurve  []float64            `json:"carbon_curve_24h"`   // full 24h carbon intensities
+	Episode      int                  `json:"episode"`
+	Step         int                  `json:"step"`
+	TaskID       int                  `json:"task_id"`
+	Done         bool                 `json:"done"`
+	Seed         int64                `json:"seed"`
+}
+// BuildingStatePublic is the dashboard-friendly full state per building.
+type BuildingStatePublic struct {
+	ObservationModel
+	OutdoorTemperature  float64    `json:"outdoor_temperature"`
+	SetpointTemperature float64    `json:"setpoint_temperature"`
+	BaselineCost        float64    `json:"baseline_cost"`
+	CumulativeCarbon    float64    `json:"cumulative_carbon"`
+	Jobs                []BatchJob `json:"jobs"`
+	// History for chart rendering
+	TempHistory         []float64  `json:"temp_history"`
+	CostHistory         []float64  `json:"cost_history"`
+	HVACHistory         []float64  `json:"hvac_history"`
+	LoadShedHistory     []float64  `json:"load_shed_history"`
+	RewardHistory       []RewardComponents `json:"reward_history"`
+}
+// ReplayEntry records a single timestep for episode replay export.
+type ReplayEntry struct {
+	Step        int              `json:"step"`
+	Observation ObservationModel `json:"observation"`
+	Action      ActionModel      `json:"action"`
+	Reward      float64          `json:"reward"`
+	Components  RewardComponents `json:"components"`
+	Done        bool             `json:"done"`
+}
+// EpisodeGrade is the final grade returned for a completed episode.
+type EpisodeGrade struct {
+	TaskID          int                    `json:"task_id"`
+	Score           float64                `json:"score"`           // 0.0–1.0
+	SubScores       map[string]float64     `json:"sub_scores"`
+	ExploitDetected bool                   `json:"exploit_detected"`
+	PenaltyApplied  float64                `json:"penalty_applied"`
+	Details         map[string]interface{} `json:"details"`
+}

env/rewards.go ADDED Viewed

	@@ -0,0 +1,143 @@

+// Package env implements the multi-component dense reward function for GridMind-RL.
+package env
+import "math"
+// ComputeRewardInput bundles all inputs needed to compute the reward for one step.
+type ComputeRewardInput struct {
+	B               *BuildingState
+	Act             ActionModel
+	StepCost        float64   // $ cost incurred this step
+	EnergyKWh       float64   // kWh consumed this step
+	TMin            float64   // lower temperature bound (°C)
+	TMax            float64   // upper temperature bound (°C)
+	StepCarbon      float64   // gCO2 emitted this step
+	BatchMissed     int       // number of batch jobs that missed deadline this step
+	GridStress      float64   // 0.0–1.0 grid stress signal
+	ShedFraction    float64   // clamped load shed fraction
+	TaskID          int       // 1, 2, or 3
+	PrevHVACLevel   float64   // previous step's HVAC power level (for stability)
+	ChargeRate      float64   // current thermal charge rate
+	PrevChargeRate  float64   // previous step's thermal charge rate
+	StorageDelta    float64   // change in storage level (+ = charging)
+	PriceCurve      []float64 // full episode price curve for arbitrage calc
+	CurrentStep     int       // current step index
+}
+// ComputeReward returns a dense RewardComponents struct from the current step inputs.
+// The reward is task-aware: task 1 only cares about cost, task 2 adds temperature,
+// task 3 adds grid response, batch deadlines, and carbon.
+func ComputeReward(inp ComputeRewardInput) RewardComponents {
+	rc := RewardComponents{}
+	// ── 1. Cost Savings ─────────────────────────────────────────────────────
+	// Negative reward proportional to energy cost. Normalised by typical step cost.
+	// Typical step cost at full load, peak price: 50kW * 0.25h * 0.32 = $4.00.
+	typicalCost := 4.0
+	rc.CostSavings = -(inp.StepCost / typicalCost) * 2.0
+	// ── 2. Temperature Constraint ────────────────────────────────────────────
+	// Only active for task 2 and 3.
+	if inp.TaskID >= 2 {
+		temp := inp.B.IndoorTemperature
+		rc.TempConstraint = computeTempReward(temp, inp.B.SetpointTemperature, inp.TMin, inp.TMax)
+	}
+	// ── 3. Grid Stress Response ──────────────────────────────────────────────
+	// Only active for task 3.
+	if inp.TaskID >= 3 {
+		rc.GridResponse = computeGridResponse(inp.GridStress, inp.ShedFraction)
+	}
+	// ── 4. Deadline Penalty ──────────────────────────────────────────────────
+	if inp.BatchMissed > 0 {
+		rc.DeadlinePenalty = -float64(inp.BatchMissed) * 1.5
+	}
+	// ── 5. Efficiency Bonus (thermal storage arbitrage) ───────────────────────
+	// Reward for charging storage during cheap periods and discharging during expensive ones.
+	if len(inp.PriceCurve) > inp.CurrentStep {
+		rc.EfficiencyBonus = computeArbitrageBonus(
+			inp.ChargeRate,
+			inp.PriceCurve[inp.CurrentStep],
+			inp.PriceCurve,
+			inp.CurrentStep,
+		)
+	}
+	// ── 6. Stability Penalty ─────────────────────────────────────────────────
+	// Penalise rapid oscillation in HVAC setpoint and thermal charge rate.
+	hvacDelta := math.Abs(inp.Act.HVACPowerLevel - inp.PrevHVACLevel)
+	chargeDelta := math.Abs(inp.ChargeRate - inp.PrevChargeRate)
+	oscillation := hvacDelta*0.5 + chargeDelta*0.3
+	if oscillation > 0.3 {
+		rc.StabilityPenalty = -(oscillation - 0.3) * 0.8
+	}
+	// ── 7. Carbon Reward ─────────────────────────────────────────────────────
+	// Low-carbon bonus: active for task 3 (and optional overlay on others).
+	if inp.TaskID >= 3 {
+		// Normalise carbon: iso-ne range roughly 100–700 gCO2/kWh
+		carbonNorm := (inp.B.CarbonIntensity - 100.0) / 600.0
+		// Reward for reducing energy during high-carbon periods
+		rc.CarbonReward = -inp.EnergyKWh * carbonNorm * 0.3
+	}
+	// ── Aggregate ────────────────────────────────────────────────────────────
+	rc.Total = rc.CostSavings + rc.TempConstraint + rc.GridResponse +
+		rc.DeadlinePenalty + rc.EfficiencyBonus + rc.StabilityPenalty + rc.CarbonReward
+	return rc
+}
+// computeTempReward returns a reward based on how close the indoor temperature
+// is to the setpoint, with a hard penalty outside [TMin, TMax].
+func computeTempReward(temp, setpoint, tMin, tMax float64) float64 {
+	if temp >= tMin && temp <= tMax {
+		// Gaussian-shaped bonus: maximum at setpoint, degrades toward bounds
+		deviation := math.Abs(temp - setpoint)
+		sigma := (tMax - tMin) / 4.0
+		return math.Exp(-0.5*(deviation/sigma)*(deviation/sigma)) * 0.5
+	}
+	// Outside bounds: proportional penalty
+	excess := math.Max(temp-tMax, tMin-temp)
+	return -excess * 0.4
+}
+// computeGridResponse returns a bonus for shedding load during high grid stress,
+// and a mild penalty for shedding when the grid is fine.
+func computeGridResponse(stress, shedFraction float64) float64 {
+	if stress > 0.7 {
+		// Bonus proportional to shed fraction
+		return shedFraction * stress * 1.5
+	}
+	// Mild penalty for unnecessary shedding (reduces productivity without benefit)
+	return -shedFraction * (0.7 - stress) * 0.3
+}
+// computeArbitrageBonus rewards charging storage during cheap periods and
+// discharging during expensive periods.
+func computeArbitrageBonus(chargeRate, currentPrice float64, curve []float64, step int) float64 {
+	// Compute rolling average of future prices (next 8 steps = 2 hours)
+	lookAhead := 8
+	futureSum := 0.0
+	count := 0
+	for i := step + 1; i <= step+lookAhead && i < len(curve); i++ {
+		futureSum += curve[i]
+		count++
+	}
+	if count == 0 {
+		return 0.0
+	}
+	futureAvg := futureSum / float64(count)
+	// If current price is lower than future avg → charging is smart → reward
+	if chargeRate > 0 && currentPrice < futureAvg {
+		return chargeRate * (futureAvg - currentPrice) * 2.0
+	}
+	// If current price is higher than future avg → discharging is smart → reward
+	if chargeRate < 0 && currentPrice > futureAvg {
+		return math.Abs(chargeRate) * (currentPrice - futureAvg) * 2.0
+	}
+	return 0.0
+}

env/tasks.go ADDED Viewed

	@@ -0,0 +1,203 @@

+// Package env defines the three GridMind-RL tasks and their deterministic graders.
+package env
+import "math"
+// TaskConfig describes a single task.
+type TaskConfig struct {
+	ID          int    `json:"id"`
+	Name        string `json:"name"`
+	Description string `json:"description"`
+	Difficulty  string `json:"difficulty"`
+	Weights     map[string]float64 `json:"weights"`
+}
+// AllTasks returns the ordered list of task configurations.
+func AllTasks() []TaskConfig {
+	return []TaskConfig{
+		{
+			ID:          1,
+			Name:        "Cost Minimization",
+			Description: "Minimize total energy cost over a 24-hour episode with no process constraints. Beat the always-on flat policy baseline.",
+			Difficulty:  "easy",
+			Weights:     map[string]float64{"cost": 1.0},
+		},
+		{
+			ID:          2,
+			Name:        "Constrained Temperature Management",
+			Description: "Minimize cost while keeping indoor temperature within ±2°C of setpoint at all times.",
+			Difficulty:  "medium",
+			Weights:     map[string]float64{"cost": 0.6, "temperature": 0.4},
+		},
+		{
+			ID:          3,
+			Name:        "Full Demand-Response with Batch Scheduling",
+			Description: "Minimize cost, maintain temperature, respond to grid stress events, schedule all batch jobs before their deadlines, and minimize carbon emissions.",
+			Difficulty:  "hard",
+			Weights:     map[string]float64{"cost": 0.35, "temperature": 0.25, "grid_response": 0.25, "batch_deadline": 0.15},
+		},
+	}
+}
+// GradeEpisodeInput collects all data needed to score a completed episode.
+type GradeEpisodeInput struct {
+	TaskID       int
+	Buildings    []*BuildingState
+	Replay       []ReplayEntry
+	TempHistory  [][]float64 // per building, per step
+	TMin         float64
+	TMax         float64
+	ExploitPenalties []float64
+}
+// GradeEpisode computes a deterministic 0.0–1.0 score for a completed episode.
+// Given a fixed random seed, this function is fully deterministic.
+func GradeEpisode(inp GradeEpisodeInput) EpisodeGrade {
+	grade := EpisodeGrade{
+		TaskID:    inp.TaskID,
+		SubScores: map[string]float64{},
+		Details:   map[string]interface{}{},
+	}
+	switch inp.TaskID {
+	case 1:
+		grade = gradeTask1(inp, grade)
+	case 2:
+		grade = gradeTask2(inp, grade)
+	case 3:
+		grade = gradeTask3(inp, grade)
+	default:
+		grade = gradeTask1(inp, grade)
+	}
+	// Exploit detection: reduce score by penalty
+	totalPenalty := 0.0
+	for i, b := range inp.Buildings {
+		_ = b
+		if i < len(inp.ExploitPenalties) {
+			totalPenalty += inp.ExploitPenalties[i]
+		}
+	}
+	if totalPenalty > 0 {
+		grade.ExploitDetected = true
+		grade.PenaltyApplied = math.Min(totalPenalty, 0.3) // max 30% penalty
+		grade.Score = math.Max(0, grade.Score-grade.PenaltyApplied)
+	}
+	grade.Score = math.Round(grade.Score*10000) / 10000 // 4 decimal places
+	return grade
+}
+// ── Task 1: Cost Minimization ───────────────────────────────────────────────
+func gradeTask1(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
+	agentCost := 0.0
+	baselineCost := 0.0
+	for _, b := range inp.Buildings {
+		agentCost += b.CumulativeCost
+		baselineCost += b.BaselineCost
+	}
+	var costScore float64
+	if baselineCost > 0 {
+		// score = max(0, 1 - agent_cost / baseline_cost)
+		// 0.0 if agent costs same or more, 1.0 if agent costs nothing
+		ratio := agentCost / baselineCost
+		costScore = math.Max(0, 1.0-ratio)
+	}
+	grade.SubScores["cost"] = math.Min(1.0, costScore)
+	grade.Score = grade.SubScores["cost"]
+	grade.Details["agent_cost"] = agentCost
+	grade.Details["baseline_cost"] = baselineCost
+	grade.Details["cost_ratio"] = agentCost / math.Max(baselineCost, 0.01)
+	return grade
+}
+// ── Task 2: Constrained Temperature Management ──────────────────────────────
+func gradeTask2(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
+	// Cost sub-score (same as task 1)
+	grade = gradeTask1(inp, grade)
+	costScore := grade.SubScores["cost"]
+	// Temperature constraint sub-score
+	totalSteps := 0
+	withinBounds := 0
+	for i, history := range inp.TempHistory {
+		_ = i
+		for _, temp := range history {
+			totalSteps++
+			if temp >= inp.TMin && temp <= inp.TMax {
+				withinBounds++
+			}
+		}
+	}
+	constraintScore := 0.0
+	if totalSteps > 0 {
+		constraintScore = float64(withinBounds) / float64(totalSteps)
+	}
+	grade.SubScores["cost"] = costScore
+	grade.SubScores["temperature"] = constraintScore
+	grade.Score = costScore*0.6 + constraintScore*0.4
+	grade.Details["within_bounds_steps"] = withinBounds
+	grade.Details["total_steps"] = totalSteps
+	return grade
+}
+// ── Task 3: Full Demand-Response with Batch Scheduling ──────────────────────
+func gradeTask3(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
+	// Reuse task 2 for cost + temperature scores
+	grade = gradeTask2(inp, grade)
+	costScore := grade.SubScores["cost"]
+	tempScore := grade.SubScores["temperature"]
+	// Grid response sub-score
+	// Count steps where stress > 0.7 and shed_fraction > 0.15
+	gridStressSteps := 0
+	gridResponseSteps := 0
+	for _, entry := range inp.Replay {
+		if entry.Observation.GridStressSignal > 0.7 {
+			gridStressSteps++
+			if entry.Action.LoadShedFraction > 0.15 {
+				gridResponseSteps++
+			}
+		}
+	}
+	gridScore := 0.5 // default neutral if no stress events
+	if gridStressSteps > 0 {
+		gridScore = float64(gridResponseSteps) / float64(gridStressSteps)
+	}
+	// Batch deadline sub-score
+	totalJobs := 0
+	completedOnTime := 0
+	for _, b := range inp.Buildings {
+		for _, job := range b.Jobs {
+			totalJobs++
+			if job.Completed && !job.MissedDeadline {
+				completedOnTime++
+			}
+		}
+	}
+	batchScore := 0.0
+	if totalJobs > 0 {
+		batchScore = float64(completedOnTime) / float64(totalJobs)
+	}
+	grade.SubScores["cost"] = costScore
+	grade.SubScores["temperature"] = tempScore
+	grade.SubScores["grid_response"] = gridScore
+	grade.SubScores["batch_deadline"] = batchScore
+	// Weighted composite score
+	grade.Score = costScore*0.35 + tempScore*0.25 + gridScore*0.25 + batchScore*0.15
+	grade.Details["grid_stress_steps"] = gridStressSteps
+	grade.Details["grid_response_steps"] = gridResponseSteps
+	grade.Details["total_jobs"] = totalJobs
+	grade.Details["completed_on_time"] = completedOnTime
+	return grade
+}

go.mod ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ module gridmind-rl
2	+
3	+ go 1.21

main.go ADDED Viewed

	@@ -0,0 +1,376 @@

+// main.go — GridMind-RL HTTP server (OpenEnv-compliant)
+// Exposes: POST /step, POST /reset, GET /state, GET /health, GET /replay, GET /grade, GET /metrics
+// Port: 7860 (Hugging Face Spaces compatible)
+package main
+import (
+	"encoding/json"
+	"fmt"
+	"log"
+	"math"
+	"net/http"
+	"os"
+	"strconv"
+	"sync"
+	"sync/atomic"
+	"time"
+	"gridmind-rl/env"
+)
+// ──────────────────────────────────────────────
+// Prometheus-style metrics (OpenTelemetry)
+// ──────────────────────────────────────────────
+type Metrics struct {
+	mu               sync.Mutex
+	stepCount        int64
+	stepLatencySum   float64
+	stepLatencyCount int64
+	rewardSum        float64
+	rewardCount      int64
+	rewardMin        float64
+	rewardMax        float64
+	// Histograms
+	actionBuckets    map[string]int64 // hvac bucket counts
+	errorCount       int64
+}
+var metrics = &Metrics{
+	rewardMin:     math.MaxFloat64,
+	rewardMax:     -math.MaxFloat64,
+	actionBuckets: map[string]int64{"low": 0, "mid": 0, "high": 0},
+}
+func (m *Metrics) recordStep(latencyMs float64, reward float64) {
+	atomic.AddInt64(&m.stepCount, 1)
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	m.stepLatencySum += latencyMs
+	m.stepLatencyCount++
+	m.rewardSum += reward
+	m.rewardCount++
+	if reward < m.rewardMin {
+		m.rewardMin = reward
+	}
+	if reward > m.rewardMax {
+		m.rewardMax = reward
+	}
+}
+func (m *Metrics) recordAction(hvac float64) {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	switch {
+	case hvac < 0.33:
+		m.actionBuckets["low"]++
+	case hvac < 0.66:
+		m.actionBuckets["mid"]++
+	default:
+		m.actionBuckets["high"]++
+	}
+}
+func (m *Metrics) prometheus() string {
+	m.mu.Lock()
+	defer m.mu.Unlock()
+	avgLatency := 0.0
+	if m.stepLatencyCount > 0 {
+		avgLatency = m.stepLatencySum / float64(m.stepLatencyCount)
+	}
+	avgReward := 0.0
+	if m.rewardCount > 0 {
+		avgReward = m.rewardSum / float64(m.rewardCount)
+	}
+	return fmt.Sprintf(`# HELP gridmind_steps_total Total environment steps taken
+# TYPE gridmind_steps_total counter
+gridmind_steps_total %d
+# HELP gridmind_step_latency_ms_avg Average step latency (ms)
+# TYPE gridmind_step_latency_ms_avg gauge
+gridmind_step_latency_ms_avg %.4f
+# HELP gridmind_reward_avg Average reward per step
+# TYPE gridmind_reward_avg gauge
+gridmind_reward_avg %.4f
+# HELP gridmind_reward_min Minimum reward seen
+# TYPE gridmind_reward_min gauge
+gridmind_reward_min %.4f
+# HELP gridmind_reward_max Maximum reward seen
+# TYPE gridmind_reward_max gauge
+gridmind_reward_max %.4f
+# HELP gridmind_action_hvac_bucket HVAC power level distribution
+# TYPE gridmind_action_hvac_bucket counter
+gridmind_action_hvac_bucket{bin="low"} %d
+gridmind_action_hvac_bucket{bin="mid"} %d
+gridmind_action_hvac_bucket{bin="high"} %d
+# HELP gridmind_errors_total Total request errors
+# TYPE gridmind_errors_total counter
+gridmind_errors_total %d
+`,
+		atomic.LoadInt64(&m.stepCount),
+		avgLatency, avgReward,
+		m.rewardMin, m.rewardMax,
+		m.actionBuckets["low"], m.actionBuckets["mid"], m.actionBuckets["high"],
+		atomic.LoadInt64(&m.errorCount),
+	)
+}
+// ──────────────────────────────────────────────
+// Server
+// ──────────────────────────────────────────────
+type Server struct {
+	envMgr *env.Environment
+}
+func newServer() *Server {
+	return &Server{envMgr: env.NewEnvironment()}
+}
+func (s *Server) routes() *http.ServeMux {
+	mux := http.NewServeMux()
+	mux.HandleFunc("/health", s.handleHealth)
+	mux.HandleFunc("/reset", s.handleReset)
+	mux.HandleFunc("/step", s.handleStep)
+	mux.HandleFunc("/state", s.handleState)
+	mux.HandleFunc("/replay", s.handleReplay)
+	mux.HandleFunc("/grade", s.handleGrade)
+	mux.HandleFunc("/tasks", s.handleTasks)
+	mux.HandleFunc("/metrics", s.handleMetrics)
+	return mux
+}
+// ── /health ──────────────────────────────────────────────────────────────────
+func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
+	w.Header().Set("Content-Type", "application/json")
+	w.WriteHeader(http.StatusOK)
+	json.NewEncoder(w).Encode(map[string]string{"status": "ok", "version": "1.0.0"})
+}
+// ── /reset ───────────────────────────────────────────────────────────────────
+func (s *Server) handleReset(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodPost {
+		http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
+		return
+	}
+	var req env.ResetRequest
+	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
+		// Allow empty body → defaults
+		req = env.ResetRequest{TaskID: 1}
+	}
+	if req.TaskID == 0 {
+		req.TaskID = 1
+	}
+	resp := s.envMgr.Reset(req)
+	w.Header().Set("Content-Type", "application/json")
+	json.NewEncoder(w).Encode(resp)
+}
+// ── /step ────────────────────────────────────────────────────────────────────
+func (s *Server) handleStep(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodPost {
+		http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
+		return
+	}
+	start := time.Now()
+	// Accept both single action and array of actions
+	var actions []env.ActionModel
+	var single env.ActionModel
+	body := make([]byte, 0, 512)
+	buf := make([]byte, 512)
+	for {
+		n, err := r.Body.Read(buf)
+		body = append(body, buf[:n]...)
+		if err != nil {
+			break
+		}
+	}
+	if len(body) > 0 && body[0] == '[' {
+		if err := json.Unmarshal(body, &actions); err != nil {
+			atomic.AddInt64(&metrics.errorCount, 1)
+			http.Error(w, "invalid action array: "+err.Error(), http.StatusBadRequest)
+			return
+		}
+	} else {
+		if err := json.Unmarshal(body, &single); err != nil {
+			atomic.AddInt64(&metrics.errorCount, 1)
+			http.Error(w, "invalid action: "+err.Error(), http.StatusBadRequest)
+			return
+		}
+		actions = []env.ActionModel{single}
+	}
+	responses, done := s.envMgr.Step(actions)
+	latency := float64(time.Since(start).Microseconds()) / 1000.0
+	for _, resp := range responses {
+		metrics.recordStep(latency, resp.Reward)
+	}
+	if len(actions) > 0 {
+		metrics.recordAction(actions[0].HVACPowerLevel)
+	}
+	w.Header().Set("Content-Type", "application/json")
+	if done && len(responses) == 1 {
+		responses[0].Done = true
+	}
+	// Return single response if single building, array otherwise
+	if len(responses) == 1 {
+		json.NewEncoder(w).Encode(responses[0])
+	} else {
+		json.NewEncoder(w).Encode(responses)
+	}
+}
+// ── /state ───────────────────────────────────────────────────────────────────
+func (s *Server) handleState(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodGet {
+		http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
+		return
+	}
+	state := s.envMgr.GetState()
+	// Add CORS for dashboard
+	w.Header().Set("Access-Control-Allow-Origin", "*")
+	w.Header().Set("Content-Type", "application/json")
+	json.NewEncoder(w).Encode(state)
+}
+// ── /replay ──────────────────────────────────────────────────────────────────
+func (s *Server) handleReplay(w http.ResponseWriter, r *http.Request) {
+	if r.Method != http.MethodGet {
+		http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
+		return
+	}
+	replay := s.envMgr.GetReplay()
+	w.Header().Set("Content-Type", "application/json")
+	w.Header().Set("Access-Control-Allow-Origin", "*")
+	json.NewEncoder(w).Encode(map[string]interface{}{
+		"replay": replay,
+		"steps":  len(replay),
+	})
+}
+// ── /grade ───────────────────────────────────────────────────────────────────
+func (s *Server) handleGrade(w http.ResponseWriter, r *http.Request) {
+	state := s.envMgr.GetState()
+	replay := s.envMgr.GetReplay()
+	// Collect per-building exploit penalties
+	penalties := make([]float64, len(state.Buildings))
+	for i := range state.Buildings {
+		_, pen := s.envMgr.ExploitDetected(i)
+		penalties[i] = pen
+	}
+	// Build building states from public state
+	buildings := make([]*env.BuildingState, len(state.Buildings))
+	for i, pub := range state.Buildings {
+		buildings[i] = &env.BuildingState{
+			CumulativeCost: pub.CumulativeCost,
+			BaselineCost:   pub.BaselineCost,
+		}
+	}
+	// Reconstruct temp history from public state
+	tempHistory := make([][]float64, len(state.Buildings))
+	for i, pub := range state.Buildings {
+		tempHistory[i] = pub.TempHistory
+	}
+	grade := env.GradeEpisode(env.GradeEpisodeInput{
+		TaskID:           state.TaskID,
+		Buildings:        buildings,
+		Replay:           replay,
+		TempHistory:      tempHistory,
+		TMin:             env.TMinDefault,
+		TMax:             env.TMaxDefault,
+		ExploitPenalties: penalties,
+	})
+	w.Header().Set("Content-Type", "application/json")
+	w.Header().Set("Access-Control-Allow-Origin", "*")
+	json.NewEncoder(w).Encode(grade)
+}
+// ── /tasks ───────────────────────────────────────────────────────────────────
+func (s *Server) handleTasks(w http.ResponseWriter, r *http.Request) {
+	w.Header().Set("Content-Type", "application/json")
+	json.NewEncoder(w).Encode(env.AllTasks())
+}
+// ── /metrics ─────────────────────────────────────────────────────────────────
+func (s *Server) handleMetrics(w http.ResponseWriter, r *http.Request) {
+	w.Header().Set("Content-Type", "text/plain; version=0.0.4")
+	fmt.Fprint(w, metrics.prometheus())
+}
+// ──────────────────────────────────────────────
+// Entry point
+// ──────────────────────────────────────────────
+func main() {
+	port := os.Getenv("PORT")
+	if port == "" {
+		port = "7860"
+	}
+	// Validate port
+	if _, err := strconv.Atoi(port); err != nil {
+		log.Fatalf("invalid PORT: %s", port)
+	}
+	srv := newServer()
+	// Perform initial reset so /state is always valid
+	var seed int64 = 42
+	srv.envMgr.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
+	log.Printf("GridMind-RL environment server starting on :%s", port)
+	log.Printf("Endpoints: GET /health /state /replay /grade /tasks /metrics | POST /reset /step")
+	mux := withCORS(withLogging(srv.routes()))
+	if err := http.ListenAndServe(":"+port, mux); err != nil {
+		log.Fatalf("server error: %v", err)
+	}
+}
+// ──────────────────────────────────────────────
+// Middleware
+// ──────────────────────────────────────────────
+func withLogging(next http.Handler) http.Handler {
+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		start := time.Now()
+		next.ServeHTTP(w, r)
+		log.Printf("%s %s %s", r.Method, r.URL.Path, time.Since(start))
+	})
+}
+func withCORS(next http.Handler) http.Handler {
+	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		w.Header().Set("Access-Control-Allow-Origin", "*")
+		w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
+		w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
+		if r.Method == http.MethodOptions {
+			w.WriteHeader(http.StatusNoContent)
+			return
+		}
+		next.ServeHTTP(w, r)
+	})
+}

openenv.yaml ADDED Viewed

	@@ -0,0 +1,137 @@

+name: gridmind-rl
+version: 1.0.0
+description: |
+  GridMind-RL: Industrial Load-Shaping and Demand-Response Environment.
+  An RL environment simulating a real-world building energy management system.
+  Control HVAC, thermal storage, and schedule batch jobs in response to
+  stochastic time-of-use prices and grid stress events.
+author: GridMind Team
+tags:
+  - reinforcement-learning
+  - energy
+  - demand-response
+  - continuous-control
+  - discrete-control
+server:
+  host: "0.0.0.0"
+  port: 7860
+  protocol: http
+schemas:
+  observation:
+    type: object
+    properties:
+      indoor_temperature:
+        type: number
+        description: Current building/process temperature (°C)
+      thermal_storage_level:
+        type: number
+        minimum: 0.0
+        maximum: 1.0
+        description: Thermal storage tank level (0.0=empty, 1.0=full)
+      process_demand:
+        type: number
+        description: Current process industrial power demand (kW)
+      current_price:
+        type: number
+        description: Real-time electricity price ($/kWh)
+      grid_stress_signal:
+        type: number
+        minimum: 0.0
+        maximum: 1.0
+        description: Utility signal for DR urgency (0=normal, 1=critical)
+      carbon_intensity:
+        type: number
+        description: Grid carbon intensity (gCO2/kWh)
+      hour_of_day:
+        type: integer
+        minimum: 0
+        maximum: 23
+        description: Hour of the day for time-aware scheduling
+      batch_queue:
+        type: array
+        items:
+          type: integer
+        description: List of pending batch jobs (values are deadline slot indices)
+      cumulative_cost:
+        type: number
+        description: Total energy cost incurred so far this episode ($)
+      step:
+        type: integer
+        minimum: 0
+        maximum: 95
+        description: Current episode step (15-min intervals)
+      building_id:
+        type: integer
+        description: Building identifier for multi-building federation
+  action:
+    type: object
+    required: ["hvac_power_level", "thermal_charge_rate", "batch_job_slot", "load_shed_fraction"]
+    properties:
+      hvac_power_level:
+        type: number
+        minimum: 0.0
+        maximum: 1.0
+        description: Fraction of max HVAC power to apply (0.0-1.0)
+      thermal_charge_rate:
+        type: number
+        minimum: -1.0
+        maximum: 1.0
+        description: Thermal storage charge (+) or discharge (-) rate
+      batch_job_slot:
+        type: integer
+        minimum: 0
+        maximum: 4
+        description: Which time slot to schedule the next batch job (0=now, 1-4=defer)
+      load_shed_fraction:
+        type: number
+        minimum: 0.0
+        maximum: 0.5
+        description: Fraction of non-critical load to shed during grid stress (0.0-0.5)
+      building_id:
+        type: integer
+        description: Building identifier for multi-building federation
+  reward:
+    type: number
+    description: Dense multi-component reward combining cost savings, temperature constraints, grid response, and deadlines.
+tasks:
+  - id: 1
+    name: "Cost Minimization"
+    description: "Minimize total energy cost over a 24-hour episode with no process constraints."
+    difficulty: "easy"
+  - id: 2
+    name: "Constrained Temperature Management"
+    description: "Minimize cost while keeping indoor temperature within ±2°C of setpoint at all times."
+    difficulty: "medium"
+  - id: 3
+    name: "Full Demand-Response with Batch Scheduling"
+    description: "Minimize cost, maintain temperature, respond to grid stress events, schedule all batch jobs, and minimize carbon."
+    difficulty: "hard"
+endpoints:
+  health:
+    path: /health
+    method: GET
+  reset:
+    path: /reset
+    method: POST
+  step:
+    path: /step
+    method: POST
+  state:
+    path: /state
+    method: GET
+  grade:
+    path: /grade
+    method: GET
+  replay:
+    path: /replay
+    method: GET
+  tasks:
+    path: /tasks
+    method: GET

python/inference.py ADDED Viewed

	@@ -0,0 +1,357 @@

+"""
+GridMind-RL Baseline Inference Script
+--------------------------------------
+Runs an LLM agent against all 3 tasks for N episodes each.
+Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
+Usage:
+    export API_BASE_URL=https://api-inference.huggingface.co/v1
+    export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
+    export HF_TOKEN=hf_xxxx
+    python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
+"""
+import argparse
+import json
+import os
+import random
+import re
+import sys
+import time
+from typing import Any
+import requests
+from openai import OpenAI
+# ── Constants ──────────────────────────────────────────────────────────────
+ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://api-inference.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN", "")
+DEFAULT_EPISODES = 3
+DEFAULT_SEED_BASE = 1000  # episodes use seed BASE+episode_idx for reproducibility
+MAX_RETRIES = 3
+SYSPROMPT = """You are GridMind, an expert industrial energy management controller.
+You control a building's HVAC, thermal storage, batch job scheduling, and load shedding.
+Your goal is to minimize electricity costs while maintaining comfort and meeting grid demand-response signals.
+Always respond with a single valid JSON object matching the action schema. No explanation needed."""
+TASK_DESCRIPTIONS = {
+    1: "Task 1 (Easy - Cost Minimization): Minimize total energy cost over 24 hours. No temperature constraints. Use cheap off-peak periods and thermal storage arbitrage.",
+    2: "Task 2 (Medium - Temperature Management): Minimize cost AND keep indoor temperature within 19-23°C at all times. Balance comfort vs cost.",
+    3: "Task 3 (Hard - Full Demand Response): Minimize cost, maintain temperature, respond to grid stress events by shedding load when grid_stress_signal > 0.7, AND schedule all batch jobs before their deadlines.",
+}
+ACTION_SCHEMA_STR = """{
+  "hvac_power_level": <float 0.0-1.0>,
+  "thermal_charge_rate": <float -1.0 to 1.0>,
+  "batch_job_slot": <int 0-4>,
+  "load_shed_fraction": <float 0.0-0.5>,
+  "building_id": 0
+}"""
+# ── Environment client ───────────────────────────────────────────────────────
+class GridMindEnvClient:
+    """Simple HTTP client for the GridMind-RL Go environment server."""
+    def __init__(self, base_url: str = ENV_URL, timeout: int = 30):
+        self.base = base_url.rstrip("/")
+        self.timeout = timeout
+    def health(self) -> bool:
+        try:
+            r = requests.get(f"{self.base}/health", timeout=5)
+            return r.status_code == 200
+        except Exception:
+            return False
+    def reset(self, task_id: int = 1, seed: int = 42, num_buildings: int = 1) -> dict:
+        payload = {"task_id": task_id, "seed": seed, "num_buildings": num_buildings}
+        r = requests.post(f"{self.base}/reset", json=payload, timeout=self.timeout)
+        r.raise_for_status()
+        return r.json()
+    def step(self, action: dict) -> dict:
+        r = requests.post(f"{self.base}/step", json=action, timeout=self.timeout)
+        r.raise_for_status()
+        return r.json()
+    def grade(self) -> dict:
+        r = requests.get(f"{self.base}/grade", timeout=self.timeout)
+        r.raise_for_status()
+        return r.json()
+    def state(self) -> dict:
+        r = requests.get(f"{self.base}/state", timeout=self.timeout)
+        r.raise_for_status()
+        return r.json()
+# ── LLM agent ───────────────────────────────────────────────────────────────
+class LLMAgent:
+    """OpenAI-compatible LLM agent that chooses actions given observations."""
+    def __init__(self):
+        self.client = OpenAI(
+            base_url=API_BASE_URL,
+            api_key=HF_TOKEN if HF_TOKEN else "none",
+        )
+        self.model = MODEL_NAME
+    def choose_action(self, obs: dict, task_id: int) -> dict:
+        """Prompt the LLM with current observation, return parsed action dict."""
+        task_desc = TASK_DESCRIPTIONS.get(task_id, TASK_DESCRIPTIONS[1])
+        prompt = f"""{task_desc}
+Current observation:
+- Indoor temperature: {obs.get('indoor_temperature', 21):.1f}°C (target: 21°C, bounds: 19-23°C)
+- Thermal storage level: {obs.get('thermal_storage_level', 0.5):.2f} (0=empty, 1=full)
+- Process demand: {obs.get('process_demand', 15):.1f} kW
+- Current electricity price: ${obs.get('current_price', 0.10):.4f}/kWh
+- Grid stress signal: {obs.get('grid_stress_signal', 0):.3f} (>0.7 = critical, shed load!)
+- Carbon intensity: {obs.get('carbon_intensity', 300):.0f} gCO2/kWh
+- Hour of day: {obs.get('hour_of_day', 12)} (0=midnight, peak prices 8-12 and 17-21)
+- Pending batch job deadlines: {obs.get('batch_queue', [])}
+- Cumulative cost so far: ${obs.get('cumulative_cost', 0):.4f}
+- Episode step: {obs.get('step', 0)}/95
+Strategy hints:
+- Charge thermal storage when price < $0.08/kWh, discharge when price > $0.15/kWh
+- Set HVAC low during peak prices (0.3-0.4) and use storage for temperature control
+- Shed 30-50% load if grid_stress_signal > 0.7
+- Schedule batch jobs early if deadline is close (slot 0 or 1)
+Respond with ONLY a JSON action:
+{ACTION_SCHEMA_STR}"""
+        for attempt in range(MAX_RETRIES):
+            try:
+                completion = self.client.chat.completions.create(
+                    model=self.model,
+                    messages=[
+                        {"role": "system", "content": SYSPROMPT},
+                        {"role": "user", "content": prompt},
+                    ],
+                    max_tokens=128,
+                    temperature=0.1,
+                )
+                content = completion.choices[0].message.content.strip()
+                return self._parse_action(content)
+            except Exception as e:
+                print(f"  [LLM attempt {attempt+1}/{MAX_RETRIES}] error: {e}")
+                time.sleep(1)
+        # Fallback: rule-based heuristic
+        return self._heuristic_action(obs)
+    def _parse_action(self, content: str) -> dict:
+        """Extract and validate JSON action from LLM response."""
+        # Try direct JSON parse
+        try:
+            action = json.loads(content)
+            return self._clamp_action(action)
+        except json.JSONDecodeError:
+            pass
+        # Try to extract JSON block from text
+        match = re.search(r"\{[^}]+\}", content, re.DOTALL)
+        if match:
+            try:
+                action = json.loads(match.group())
+                return self._clamp_action(action)
+            except json.JSONDecodeError:
+                pass
+        # Fallback
+        print(f"  [WARN] could not parse LLM response: {content[:100]}")
+        return self._default_action()
+    def _clamp_action(self, action: dict) -> dict:
+        return {
+            "hvac_power_level": max(0.0, min(1.0, float(action.get("hvac_power_level", 0.5)))),
+            "thermal_charge_rate": max(-1.0, min(1.0, float(action.get("thermal_charge_rate", 0.0)))),
+            "batch_job_slot": max(0, min(4, int(action.get("batch_job_slot", 0)))),
+            "load_shed_fraction": max(0.0, min(0.5, float(action.get("load_shed_fraction", 0.0)))),
+            "building_id": int(action.get("building_id", 0)),
+        }
+    def _heuristic_action(self, obs: dict) -> dict:
+        """Simple rule-based heuristic when LLM is unavailable."""
+        price = obs.get("current_price", 0.10)
+        stress = obs.get("grid_stress_signal", 0.0)
+        temp = obs.get("indoor_temperature", 21.0)
+        storage = obs.get("thermal_storage_level", 0.5)
+        queue = obs.get("batch_queue", [])
+        # HVAC: reduce during peak
+        hvac = 0.7 if price < 0.08 else (0.3 if price > 0.15 else 0.5)
+        # Adjust for temperature
+        if temp > 23.0:
+            hvac = max(hvac, 0.8)
+        elif temp < 19.0:
+            hvac = min(hvac, 0.2)
+        # Storage arbitrage
+        charge = 0.0
+        if price < 0.07 and storage < 0.8:
+            charge = 0.5
+        elif price > 0.15 and storage > 0.3:
+            charge = -0.5
+        # Load shedding
+        shed = 0.0
+        if stress > 0.7:
+            shed = 0.4
+        elif stress > 0.5:
+            shed = 0.2
+        # Batch jobs: schedule soon if deadline approaching
+        slot = 2
+        if queue and min(queue) < 10:
+            slot = 0
+        return {
+            "hvac_power_level": hvac,
+            "thermal_charge_rate": charge,
+            "batch_job_slot": slot,
+            "load_shed_fraction": shed,
+            "building_id": 0,
+        }
+    def _default_action(self) -> dict:
+        return {"hvac_power_level": 0.5, "thermal_charge_rate": 0.0,
+                "batch_job_slot": 0, "load_shed_fraction": 0.0, "building_id": 0}
+# ── Episode runner ───────────────────────────────────────────────────────────
+def run_episode(env_client: GridMindEnvClient, agent: LLMAgent,
+                task_id: int, seed: int, verbose: bool = False) -> dict[str, Any]:
+    """Run a single episode and return grade + metadata."""
+    reset_resp = env_client.reset(task_id=task_id, seed=seed)
+    obs = reset_resp["observations"][0]
+    total_reward = 0.0
+    total_steps = 0
+    start_time = time.time()
+    for _step in range(96):
+        action = agent.choose_action(obs, task_id)
+        step_resp = env_client.step(action)
+        obs = step_resp["observation"]
+        total_reward += step_resp["reward"]
+        total_steps += 1
+        if verbose and _step % 16 == 0:
+            print(f"    step={_step:02d} price=${obs['current_price']:.3f} "
+                  f"temp={obs['indoor_temperature']:.1f}°C "
+                  f"stress={obs['grid_stress_signal']:.2f} "
+                  f"cost=${obs['cumulative_cost']:.2f} "
+                  f"reward={step_resp['reward']:.3f}")
+        if step_resp.get("done", False):
+            break
+    elapsed = time.time() - start_time
+    grade = env_client.grade()
+    return {
+        "task_id": task_id,
+        "seed": seed,
+        "total_reward": total_reward,
+        "total_steps": total_steps,
+        "elapsed_sec": elapsed,
+        "score": grade.get("score", 0.0),
+        "sub_scores": grade.get("sub_scores", {}),
+        "exploit_detected": grade.get("exploit_detected", False),
+    }
+# ── Main ─────────────────────────────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="GridMind-RL baseline inference")
+    parser.add_argument("--episodes", type=int, default=DEFAULT_EPISODES)
+    parser.add_argument("--env-url", type=str, default=ENV_URL)
+    parser.add_argument("--verbose", action="store_true")
+    parser.add_argument("--output", type=str, default="baseline_scores.json")
+    args = parser.parse_args()
+    print("=" * 60)
+    print("GridMind-RL Baseline Inference")
+    print(f"  Model: {MODEL_NAME}")
+    print(f"  API:   {API_BASE_URL}")
+    print(f"  Env:   {args.env_url}")
+    print(f"  Episodes per task: {args.episodes}")
+    print("=" * 60)
+    env_client = GridMindEnvClient(base_url=args.env_url)
+    # Wait for env server to be healthy
+    print("\nWaiting for environment server...")
+    for attempt in range(30):
+        if env_client.health():
+            print("  ✓ Environment server is healthy")
+            break
+        time.sleep(2)
+        if attempt == 29:
+            print("  ✗ Environment server not reachable. Exiting.")
+            sys.exit(1)
+    agent = LLMAgent()
+    all_results = []
+    for task_id in [1, 2, 3]:
+        print(f"\n── Task {task_id}: {TASK_DESCRIPTIONS[task_id][:60]}...")
+        task_scores = []
+        for ep in range(args.episodes):
+            seed = DEFAULT_SEED_BASE + task_id * 100 + ep
+            print(f"  Episode {ep+1}/{args.episodes} (seed={seed})")
+            result = run_episode(env_client, agent, task_id=task_id, seed=seed, verbose=args.verbose)
+            task_scores.append(result["score"])
+            all_results.append(result)
+            print(f"    → score={result['score']:.4f} | reward={result['total_reward']:.3f} | {result['elapsed_sec']:.1f}s")
+        avg_score = sum(task_scores) / len(task_scores)
+        print(f"  Task {task_id} average score: {avg_score:.4f}")
+    # Score summary table
+    print("\n" + "=" * 60)
+    print("BASELINE SCORES SUMMARY")
+    print("=" * 60)
+    print(f"{'Task':<10} {'Model':<30} {'Score':<10} {'Episodes':<10}")
+    print("-" * 60)
+    task_avgs = {}
+    for task_id in [1, 2, 3]:
+        scores = [r["score"] for r in all_results if r["task_id"] == task_id]
+        avg = sum(scores) / len(scores) if scores else 0.0
+        task_avgs[task_id] = avg
+        print(f"Task {task_id:<6} {MODEL_NAME:<30} {avg:<10.4f} {len(scores)}")
+    print("-" * 60)
+    overall = sum(task_avgs.values()) / len(task_avgs)
+    print(f"{'Overall':<10} {'':<30} {overall:<10.4f}")
+    # Save results
+    output = {
+        "model": MODEL_NAME,
+        "api_base": API_BASE_URL,
+        "episodes_per_task": args.episodes,
+        "seed_base": DEFAULT_SEED_BASE,
+        "task_averages": {str(k): v for k, v in task_avgs.items()},
+        "overall_average": overall,
+        "all_results": all_results,
+    }
+    with open(args.output, "w") as f:
+        json.dump(output, f, indent=2)
+    print(f"\n✓ Results saved to {args.output}")
+if __name__ == "__main__":
+    main()

python/models.py ADDED Viewed

	@@ -0,0 +1,221 @@

+"""
+GridMind-RL OpenEnv Pydantic models.
+These types mirror the Go structs exactly for full schema compliance.
+"""
+from __future__ import annotations
+from typing import List, Optional, Dict, Any
+from pydantic import BaseModel, Field, field_validator
+class BatchJob(BaseModel):
+    id: int
+    deadline_slot: int
+    duration: int
+    power_draw: float
+    scheduled: bool
+    scheduled_at: int
+    completed: bool
+    missed_deadline: bool
+class ObservationModel(BaseModel):
+    """Full observation returned on each step / GET /state."""
+    indoor_temperature: float = Field(..., description="Current building indoor temperature (°C)")
+    thermal_storage_level: float = Field(..., ge=0.0, le=1.0, description="Thermal storage fill level (0–1)")
+    process_demand: float = Field(..., ge=0.0, description="Current process power demand (kW)")
+    current_price: float = Field(..., gt=0.0, description="Real-time electricity price ($/kWh)")
+    grid_stress_signal: float = Field(..., ge=0.0, le=1.0, description="Utility demand-response urgency (0–1)")
+    carbon_intensity: float = Field(..., ge=0.0, description="Grid carbon intensity (gCO2/kWh)")
+    hour_of_day: int = Field(..., ge=0, le=23, description="Current hour of day (0–23)")
+    batch_queue: List[int] = Field(default_factory=list, description="Deadline slots of pending batch jobs")
+    cumulative_cost: float = Field(..., ge=0.0, description="Running energy cost this episode ($)")
+    step: int = Field(..., ge=0, description="Current timestep (0–95)")
+    building_id: int = Field(default=0, description="Building index in federation")
+class ActionModel(BaseModel):
+    """Agent action for a single timestep."""
+    hvac_power_level: float = Field(..., ge=0.0, le=1.0, description="HVAC fraction of max power (0–1)")
+    thermal_charge_rate: float = Field(..., ge=-1.0, le=1.0, description="Storage charge (+) or discharge (-) rate")
+    batch_job_slot: int = Field(..., ge=0, le=4, description="Time slot offset for next batch job (0=now, 1–4=defer)")
+    load_shed_fraction: float = Field(..., ge=0.0, le=0.5, description="Fraction of non-critical load to shed (0–0.5)")
+    building_id: int = Field(default=0, description="Building index this action targets")
+    @field_validator("hvac_power_level")
+    @classmethod
+    def clamp_hvac(cls, v: float) -> float:
+        return max(0.0, min(1.0, v))
+    @field_validator("thermal_charge_rate")
+    @classmethod
+    def clamp_charge(cls, v: float) -> float:
+        return max(-1.0, min(1.0, v))
+    @field_validator("load_shed_fraction")
+    @classmethod
+    def clamp_shed(cls, v: float) -> float:
+        return max(0.0, min(0.5, v))
+class RewardComponents(BaseModel):
+    """Individual reward signal components."""
+    cost_savings: float = Field(..., description="Negative reward for energy cost")
+    temp_constraint: float = Field(..., description="Positive if temperature within bounds")
+    grid_response: float = Field(..., description="Bonus for shedding during high grid stress")
+    deadline_penalty: float = Field(..., description="Negative for missed batch deadlines")
+    efficiency_bonus: float = Field(..., description="Storage arbitrage bonus")
+    stability_penalty: float = Field(..., description="Penalty for rapid HVAC oscillation")
+    carbon_reward: float = Field(..., description="Low-carbon operation bonus")
+    total: float = Field(..., description="Weighted sum of all components")
+class StepInfo(BaseModel):
+    """Auxiliary information returned at each step."""
+    reward_components: RewardComponents
+    energy_used_kwh: float
+    carbon_emitted_gco2: float
+    price_signal: float
+    grid_stress: float
+    batch_completed: List[int] = Field(default_factory=list)
+    batch_missed: List[int] = Field(default_factory=list)
+    episode: int
+    step: int
+class StepResponse(BaseModel):
+    """Full response from POST /step."""
+    observation: ObservationModel
+    reward: float
+    done: bool
+    info: StepInfo
+class ResetRequest(BaseModel):
+    """Request body for POST /reset."""
+    seed: Optional[int] = Field(None, description="Random seed for reproducibility")
+    task_id: int = Field(1, ge=1, le=3, description="Task to run (1=easy, 2=medium, 3=hard)")
+    difficulty: Optional[str] = Field(None, description="Override difficulty: easy/medium/hard")
+    num_buildings: int = Field(1, ge=1, le=3, description="Number of buildings in federation")
+class ResetResponse(BaseModel):
+    """Response from POST /reset."""
+    observations: List[ObservationModel]
+    episode: int
+    task_id: int
+    seed: int
+class BuildingStatePublic(BaseModel):
+    """Full building state including history for dashboard rendering."""
+    # ObservationModel fields (flattened)
+    indoor_temperature: float
+    thermal_storage_level: float
+    process_demand: float
+    current_price: float
+    grid_stress_signal: float
+    carbon_intensity: float
+    hour_of_day: int
+    batch_queue: List[int] = Field(default_factory=list)
+    cumulative_cost: float
+    step: int
+    building_id: int
+    # Extended state
+    outdoor_temperature: float
+    setpoint_temperature: float
+    baseline_cost: float
+    cumulative_carbon: float
+    jobs: List[BatchJob] = Field(default_factory=list)
+    # History arrays
+    temp_history: List[float] = Field(default_factory=list)
+    cost_history: List[float] = Field(default_factory=list)
+    hvac_history: List[float] = Field(default_factory=list)
+    load_shed_history: List[float] = Field(default_factory=list)
+    reward_history: List[RewardComponents] = Field(default_factory=list)
+class StateResponse(BaseModel):
+    """Full environment state from GET /state."""
+    buildings: List[BuildingStatePublic]
+    price_curve_24h: List[float]
+    carbon_curve_24h: List[float]
+    episode: int
+    step: int
+    task_id: int
+    done: bool
+    seed: int
+class TaskConfig(BaseModel):
+    """Task configuration."""
+    id: int
+    name: str
+    description: str
+    difficulty: str
+    weights: Dict[str, float]
+class EpisodeGrade(BaseModel):
+    """Graded episode result."""
+    task_id: int
+    score: float = Field(..., ge=0.0, le=1.0)
+    sub_scores: Dict[str, float]
+    exploit_detected: bool
+    penalty_applied: float
+    details: Dict[str, Any]
+# ── Action space schema (for LLM prompting) ────────────────────────────────
+ACTION_SCHEMA = {
+    "type": "object",
+    "properties": {
+        "hvac_power_level": {
+            "type": "number",
+            "minimum": 0.0,
+            "maximum": 1.0,
+            "description": "Fraction of max HVAC power (0=off, 1=full power)"
+        },
+        "thermal_charge_rate": {
+            "type": "number",
+            "minimum": -1.0,
+            "maximum": 1.0,
+            "description": "Charge (+) or discharge (-) thermal storage at this fraction of max rate"
+        },
+        "batch_job_slot": {
+            "type": "integer",
+            "minimum": 0,
+            "maximum": 4,
+            "description": "Schedule next batch job: 0=run now, 1-4=defer by N 15-min intervals"
+        },
+        "load_shed_fraction": {
+            "type": "number",
+            "minimum": 0.0,
+            "maximum": 0.5,
+            "description": "Fraction of non-critical load to shed during this step (0=no shedding)"
+        },
+        "building_id": {
+            "type": "integer",
+            "minimum": 0,
+            "description": "Which building to apply this action to (0 for single-building mode)"
+        }
+    },
+    "required": ["hvac_power_level", "thermal_charge_rate", "batch_job_slot", "load_shed_fraction"]
+}
+# ── Observation space schema ───────────────────────────────────────────────
+OBSERVATION_SCHEMA = {
+    "type": "object",
+    "properties": {
+        "indoor_temperature": {"type": "number", "description": "Indoor temperature °C"},
+        "thermal_storage_level": {"type": "number", "minimum": 0, "maximum": 1},
+        "process_demand": {"type": "number", "description": "Process power demand kW"},
+        "current_price": {"type": "number", "description": "Electricity price $/kWh"},
+        "grid_stress_signal": {"type": "number", "minimum": 0, "maximum": 1},
+        "carbon_intensity": {"type": "number", "description": "Grid carbon intensity gCO2/kWh"},
+        "hour_of_day": {"type": "integer", "minimum": 0, "maximum": 23},
+        "batch_queue": {"type": "array", "items": {"type": "integer"}},
+        "cumulative_cost": {"type": "number"},
+        "step": {"type": "integer"},
+        "building_id": {"type": "integer"}
+    }
+}

python/requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+openai>=1.0.0
+fastapi>=0.100.0
+uvicorn>=0.23.0
+pydantic>=2.0.0
+requests>=2.31.0
+httpx>=0.24.0
+pytest>=7.0.0

python/validate.py ADDED Viewed

	@@ -0,0 +1,258 @@

+"""
+GridMind-RL Pre-Submission Validator
+--------------------------------------
+Validates the Go environment server against all OpenEnv spec requirements.
+Run with: python python/validate.py [--env-url http://localhost:7860]
+"""
+import argparse
+import json
+import sys
+import time
+import traceback
+from typing import Any
+import requests
+ENV_URL = "http://localhost:7860"
+PASS = "✓"
+FAIL = "✗"
+WARN = "⚠"
+def check(label: str, condition: bool, detail: str = "") -> bool:
+    icon = PASS if condition else FAIL
+    line = f"  {icon} {label}"
+    if detail:
+        line += f" — {detail}"
+    print(line)
+    return condition
+def get(url: str, timeout: int = 10) -> requests.Response:
+    return requests.get(url, timeout=timeout)
+def post(url: str, payload: Any = None, timeout: int = 10) -> requests.Response:
+    return requests.post(url, json=payload, timeout=timeout)
+def validate(env_url: str) -> bool:
+    base = env_url.rstrip("/")
+    results = []
+    print("\n══════════════════════════════════════════")
+    print("  GridMind-RL OpenEnv Validation Report")
+    print("══════════════════════════════════════════\n")
+    # ── 1. Health endpoint ──────────────────────────────────────────────────
+    print("1. Health Endpoint")
+    try:
+        r = get(f"{base}/health")
+        results.append(check("GET /health returns 200", r.status_code == 200, f"got {r.status_code}"))
+        data = r.json()
+        results.append(check("Response has 'status' field", "status" in data))
+    except Exception as e:
+        results.append(check("GET /health reachable", False, str(e)))
+        print(f"\n  ✗ Cannot reach server at {base}. Is it running?\n")
+        return False
+    # ── 2. Reset endpoint ───────────────────────────────────────────────────
+    print("\n2. Reset Endpoint")
+    reset_resp = None
+    try:
+        r = post(f"{base}/reset", {"task_id": 1, "seed": 42, "num_buildings": 1})
+        results.append(check("POST /reset returns 200", r.status_code == 200, f"got {r.status_code}"))
+        reset_resp = r.json()
+        results.append(check("Response has 'observations'", "observations" in reset_resp))
+        results.append(check("Response has 'episode'", "episode" in reset_resp))
+        results.append(check("Response has 'seed'", "seed" in reset_resp))
+        results.append(check("Response has 'task_id'", "task_id" in reset_resp))
+        obs_list = reset_resp.get("observations", [])
+        results.append(check("observations is a list", isinstance(obs_list, list)))
+        results.append(check("At least 1 observation returned", len(obs_list) >= 1))
+        if obs_list:
+            obs = obs_list[0]
+            obs_fields = ["indoor_temperature", "thermal_storage_level", "process_demand",
+                          "current_price", "grid_stress_signal", "carbon_intensity",
+                          "hour_of_day", "batch_queue", "cumulative_cost", "step"]
+            for field in obs_fields:
+                results.append(check(f"obs has '{field}'", field in obs))
+        # Seed reproducibility
+        r2 = post(f"{base}/reset", {"task_id": 1, "seed": 42})
+        d2 = r2.json()
+        obs1 = reset_resp.get("observations", [{}])[0]
+        obs2 = d2.get("observations", [{}])[0]
+        same = (abs(obs1.get("indoor_temperature", 0) - obs2.get("indoor_temperature", 0)) < 1e-6)
+        results.append(check("Same seed produces same initial obs", same))
+    except Exception as e:
+        results.append(check("POST /reset succeeds", False, str(e)))
+        traceback.print_exc()
+    # ── 3. Step endpoint ────────────────────────────────────────────────────
+    print("\n3. Step Endpoint")
+    try:
+        # Reset fresh
+        post(f"{base}/reset", {"task_id": 1, "seed": 100})
+        action = {
+            "hvac_power_level": 0.5,
+            "thermal_charge_rate": 0.1,
+            "batch_job_slot": 1,
+            "load_shed_fraction": 0.0,
+            "building_id": 0,
+        }
+        r = post(f"{base}/step", action)
+        results.append(check("POST /step returns 200", r.status_code == 200))
+        step_resp = r.json()
+        step_fields = ["observation", "reward", "done", "info"]
+        for f in step_fields:
+            results.append(check(f"step response has '{f}'", f in step_resp))
+        results.append(check("reward is numeric", isinstance(step_resp.get("reward"), (int, float))))
+        results.append(check("done is boolean", isinstance(step_resp.get("done"), bool)))
+        info = step_resp.get("info", {})
+        results.append(check("info has 'reward_components'", "reward_components" in info))
+        results.append(check("info has 'energy_used_kwh'", "energy_used_kwh" in info))
+        rc = info.get("reward_components", {})
+        rc_fields = ["cost_savings", "temp_constraint", "grid_response",
+                     "deadline_penalty", "efficiency_bonus", "stability_penalty", "total"]
+        for f in rc_fields:
+            results.append(check(f"reward_components has '{f}'", f in rc))
+        # Test array action format
+        r2 = post(f"{base}/step", [action])
+        results.append(check("POST /step accepts array of actions", r2.status_code == 200))
+    except Exception as e:
+        results.append(check("POST /step succeeds", False, str(e)))
+        traceback.print_exc()
+    # ── 4. State endpoint ───────────────────────────────────────────────────
+    print("\n4. State Endpoint")
+    try:
+        r = get(f"{base}/state")
+        results.append(check("GET /state returns 200", r.status_code == 200))
+        state = r.json()
+        state_fields = ["buildings", "price_curve_24h", "carbon_curve_24h",
+                        "episode", "step", "task_id", "done", "seed"]
+        for f in state_fields:
+            results.append(check(f"state has '{f}'", f in state))
+        results.append(check("price_curve_24h has 24 entries",
+                             len(state.get("price_curve_24h", [])) == 24))
+        results.append(check("carbon_curve_24h has 24 entries",
+                             len(state.get("carbon_curve_24h", [])) == 24))
+    except Exception as e:
+        results.append(check("GET /state succeeds", False, str(e)))
+    # ── 5. Replay endpoint ──────────────────────────────────────────────────
+    print("\n5. Replay Endpoint")
+    try:
+        r = get(f"{base}/replay")
+        results.append(check("GET /replay returns 200", r.status_code == 200))
+        replay = r.json()
+        results.append(check("response has 'replay' list", "replay" in replay))
+        results.append(check("response has 'steps' count", "steps" in replay))
+    except Exception as e:
+        results.append(check("GET /replay succeeds", False, str(e)))
+    # ── 6. Grade endpoint ───────────────────────────────────────────────────
+    print("\n6. Grade Endpoint")
+    try:
+        # Run quick 10-step episode
+        post(f"{base}/reset", {"task_id": 1, "seed": 777})
+        action = {"hvac_power_level": 0.3, "thermal_charge_rate": 0.0,
+                  "batch_job_slot": 0, "load_shed_fraction": 0.0}
+        for _ in range(10):
+            post(f"{base}/step", action)
+        r = get(f"{base}/grade")
+        results.append(check("GET /grade returns 200", r.status_code == 200))
+        grade = r.json()
+        grade_fields = ["task_id", "score", "sub_scores", "exploit_detected"]
+        for f in grade_fields:
+            results.append(check(f"grade has '{f}'", f in grade))
+        score = grade.get("score", -1)
+        results.append(check("score in [0.0, 1.0]", 0.0 <= score <= 1.0, f"score={score:.4f}"))
+    except Exception as e:
+        results.append(check("GET /grade succeeds", False, str(e)))
+    # ── 7. Tasks endpoint ───────────────────────────────────────────────────
+    print("\n7. Tasks Endpoint")
+    try:
+        r = get(f"{base}/tasks")
+        results.append(check("GET /tasks returns 200", r.status_code == 200))
+        tasks = r.json()
+        results.append(check("returns list of 3 tasks", len(tasks) == 3))
+        task_fields = ["id", "name", "description", "difficulty", "weights"]
+        for f in task_fields:
+            results.append(check(f"task has '{f}'", f in tasks[0]))
+    except Exception as e:
+        results.append(check("GET /tasks succeeds", False, str(e)))
+    # ── 8. Metrics endpoint ─────────────────────────────────────────────────
+    print("\n8. Metrics Endpoint (Prometheus)")
+    try:
+        r = get(f"{base}/metrics")
+        results.append(check("GET /metrics returns 200", r.status_code == 200))
+        content = r.text
+        results.append(check("metrics contain step counter",
+                             "gridmind_steps_total" in content))
+        results.append(check("metrics contain latency gauge",
+                             "gridmind_step_latency_ms_avg" in content))
+    except Exception as e:
+        results.append(check("GET /metrics succeeds", False, str(e)))
+    # ── 9. Grader score variation ─────────────────────��─────────────────────
+    print("\n9. Grader Score Variation (non-trivial scores)")
+    scores_nonzero = []
+    scores_nonone = []
+    for seed in [10, 20, 30]:
+        try:
+            post(f"{base}/reset", {"task_id": 1, "seed": seed})
+            # Two different policies
+            for a in [0.1, 0.9]:
+                post(f"{base}/reset", {"task_id": 1, "seed": seed})
+                for _ in range(96):
+                    post(f"{base}/step", {"hvac_power_level": a, "thermal_charge_rate": 0,
+                                          "batch_job_slot": 0, "load_shed_fraction": 0})
+                g = requests.get(f"{base}/grade", timeout=10).json()
+                sc = g.get("score", 0)
+                scores_nonzero.append(sc > 0.01)
+                scores_nonone.append(sc < 0.999)
+        except Exception:
+            pass
+    results.append(check("Scores are not always 0.0", any(scores_nonzero)))
+    results.append(check("Scores are not always 1.0", any(scores_nonone)))
+    # ── Summary ─────────────────────────────────────────────────────────────
+    passed = sum(results)
+    total = len(results)
+    pct = 100 * passed // total if total > 0 else 0
+    print(f"\n══════════════════════════════════════════")
+    print(f"  Result: {passed}/{total} checks passed ({pct}%)")
+    if passed == total:
+        print("  🎉 ALL CHECKS PASSED — Ready for submission!")
+    else:
+        print(f"  ⚠  {total - passed} checks failed. Fix errors above.")
+    print("══════════════════════════════════════════\n")
+    return passed == total
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--env-url", type=str, default=ENV_URL)
+    args = parser.parse_args()
+    ok = validate(args.env_url)
+    sys.exit(0 if ok else 1)
+if __name__ == "__main__":
+    main()

tests/environment_test.go ADDED Viewed

	@@ -0,0 +1,191 @@

+// tests/test_environment.go — unit tests for GridMind-RL environment
+package tests
+import (
+	"testing"
+	"math"
+	"gridmind-rl/env"
+)
+// TestResetProducesValidObservation checks that reset returns sane initial observations.
+func TestResetProducesValidObservation(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 42
+	resp := e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
+	if len(resp.Observations) != 1 {
+		t.Fatalf("expected 1 observation, got %d", len(resp.Observations))
+	}
+	obs := resp.Observations[0]
+	if obs.IndoorTemperature < 10 || obs.IndoorTemperature > 40 {
+		t.Errorf("indoor_temperature out of range: %.2f", obs.IndoorTemperature)
+	}
+	if obs.ThermalStorageLevel < 0 || obs.ThermalStorageLevel > 1 {
+		t.Errorf("thermal_storage_level out of [0,1]: %.3f", obs.ThermalStorageLevel)
+	}
+	if obs.CurrentPrice <= 0 {
+		t.Errorf("current_price must be positive, got %.4f", obs.CurrentPrice)
+	}
+	if obs.HourOfDay < 0 || obs.HourOfDay > 23 {
+		t.Errorf("hour_of_day out of [0,23]: %d", obs.HourOfDay)
+	}
+	if obs.GridStressSignal < 0 || obs.GridStressSignal > 1 {
+		t.Errorf("grid_stress_signal out of [0,1]: %.3f", obs.GridStressSignal)
+	}
+}
+// TestStepAdvancesState verifies that step increments the step counter.
+func TestStepAdvancesState(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 1
+	e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
+	action := []env.ActionModel{{HVACPowerLevel: 0.5, ThermalChargeRate: 0.0, BatchJobSlot: 0}}
+	resps, done := e.Step(action)
+	if done {
+		t.Error("episode should not be done after first step")
+	}
+	if len(resps) != 1 {
+		t.Fatalf("expected 1 step response, got %d", len(resps))
+	}
+	state := e.GetState()
+	if state.Step != 1 {
+		t.Errorf("expected step=1 after one step, got %d", state.Step)
+	}
+}
+// TestEpisodeLengthIs96 verifies the episode terminates at step 96.
+func TestEpisodeLengthIs96(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 99
+	e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
+	action := []env.ActionModel{{HVACPowerLevel: 0.5}}
+	var lastDone bool
+	for i := 0; i < 96; i++ {
+		_, lastDone = e.Step(action)
+	}
+	if !lastDone {
+		t.Errorf("episode should be done after 96 steps")
+	}
+}
+// TestDeterministicWithSeed verifies that two runs with the same seed produce identical rewards.
+func TestDeterministicWithSeed(t *testing.T) {
+	action := []env.ActionModel{{HVACPowerLevel: 0.4, ThermalChargeRate: 0.1, BatchJobSlot: 1}}
+	var seed int64 = 1337
+	run := func() float64 {
+		e := env.NewEnvironment()
+		e.Reset(env.ResetRequest{Seed: &seed, TaskID: 2, NumBuildings: 1})
+		resps, _ := e.Step(action)
+		return resps[0].Reward
+	}
+	r1 := run()
+	r2 := run()
+	if math.Abs(r1-r2) > 1e-9 {
+		t.Errorf("non-deterministic rewards with same seed: %.6f vs %.6f", r1, r2)
+	}
+}
+// TestActionClamping verifies out-of-range actions are clamped.
+func TestActionClamping(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 7
+	e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1})
+	// Over-range action
+	action := []env.ActionModel{{HVACPowerLevel: 2.0, ThermalChargeRate: -5.0, LoadShedFraction: 0.9}}
+	resps, _ := e.Step(action)
+	if len(resps) == 0 {
+		t.Fatal("no responses returned")
+	}
+	// After step, state should still be valid
+	state := e.GetState()
+	if len(state.Buildings) == 0 {
+		t.Fatal("no buildings in state")
+	}
+	b := state.Buildings[0]
+	if b.ThermalStorageLevel < 0 || b.ThermalStorageLevel > 1 {
+		t.Errorf("thermal storage out of bounds: %.3f", b.ThermalStorageLevel)
+	}
+}
+// TestMultiBuildingFederation checks that 3-building reset + step works.
+func TestMultiBuildingFederation(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 5
+	resp := e.Reset(env.ResetRequest{Seed: &seed, TaskID: 3, NumBuildings: 3})
+	if len(resp.Observations) != 3 {
+		t.Fatalf("expected 3 observations for 3 buildings, got %d", len(resp.Observations))
+	}
+	actions := []env.ActionModel{
+		{HVACPowerLevel: 0.3, BuildingID: 0},
+		{HVACPowerLevel: 0.5, BuildingID: 1},
+		{HVACPowerLevel: 0.7, BuildingID: 2},
+	}
+	resps, _ := e.Step(actions)
+	if len(resps) != 3 {
+		t.Fatalf("expected 3 step responses, got %d", len(resps))
+	}
+}
+// TestRewardComponentsAreFinite verifies no NaN/Inf in rewards.
+func TestRewardComponentsAreFinite(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 42
+	e.Reset(env.ResetRequest{Seed: &seed, TaskID: 3})
+	action := []env.ActionModel{{HVACPowerLevel: 0.5, ThermalChargeRate: 0.2, BatchJobSlot: 2, LoadShedFraction: 0.3}}
+	resps, _ := e.Step(action)
+	rc := resps[0].Info.RewardComponents
+	vals := []float64{rc.CostSavings, rc.TempConstraint, rc.GridResponse,
+		rc.DeadlinePenalty, rc.EfficiencyBonus, rc.StabilityPenalty, rc.CarbonReward, rc.Total}
+	for i, v := range vals {
+		if math.IsNaN(v) || math.IsInf(v, 0) {
+			t.Errorf("reward component %d is not finite: %v", i, v)
+		}
+	}
+}
+// TestGraderTask1ScoreRange verifies Task 1 score is always in [0, 1].
+func TestGraderTask1ScoreRange(t *testing.T) {
+	e := env.NewEnvironment()
+	var seed int64 = 101
+	e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1})
+	action := []env.ActionModel{{HVACPowerLevel: 0.3}}
+	for i := 0; i < 96; i++ {
+		e.Step(action)
+	}
+	state := e.GetState()
+	replay := e.GetReplay()
+	buildings := make([]*env.BuildingState, len(state.Buildings))
+	for i, pub := range state.Buildings {
+		buildings[i] = &env.BuildingState{
+			CumulativeCost: pub.CumulativeCost,
+			BaselineCost:   pub.BaselineCost,
+		}
+	}
+	grade := env.GradeEpisode(env.GradeEpisodeInput{
+		TaskID:    1,
+		Buildings: buildings,
+		Replay:    replay,
+		TMin:      env.TMinDefault,
+		TMax:      env.TMaxDefault,
+	})
+	if grade.Score < 0 || grade.Score > 1 {
+		t.Errorf("Task 1 score out of [0,1]: %.4f", grade.Score)
+	}
+}

tests/test_graders.py ADDED Viewed

	@@ -0,0 +1,173 @@

+"""
+GridMind-RL Grader Validation Tests (pytest)
+Run with: pytest tests/test_graders.py -v
+"""
+import json
+import time
+import pytest
+import requests
+ENV_URL = "http://localhost:7860"
+BASE = ENV_URL
+def wait_for_server(url: str, timeout: int = 15):
+    for _ in range(timeout):
+        try:
+            r = requests.get(f"{url}/health", timeout=2)
+            if r.status_code == 200:
+                return True
+        except Exception:
+            pass
+        time.sleep(1)
+    return False
+@pytest.fixture(scope="session", autouse=True)
+def server_running():
+    if not wait_for_server(ENV_URL):
+        pytest.skip("GridMind-RL server not running at " + ENV_URL)
+def reset(task_id=1, seed=42):
+    r = requests.post(f"{BASE}/reset", json={"task_id": task_id, "seed": seed, "num_buildings": 1})
+    r.raise_for_status()
+    return r.json()
+def step(action: dict) -> dict:
+    r = requests.post(f"{BASE}/step", json=action)
+    r.raise_for_status()
+    return r.json()
+def grade() -> dict:
+    r = requests.get(f"{BASE}/grade")
+    r.raise_for_status()
+    return r.json()
+def run_full_episode(task_id: int, seed: int, hvac: float = 0.5) -> dict:
+    reset(task_id=task_id, seed=seed)
+    action = {"hvac_power_level": hvac, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0}
+    for _ in range(96):
+        resp = step(action)
+        if resp.get("done"):
+            break
+    return grade()
+# ── Task 1 ──────────────────────────────────────────────────────────────────
+class TestTask1:
+    def test_score_in_range(self):
+        g = run_full_episode(task_id=1, seed=1)
+        assert 0.0 <= g["score"] <= 1.0, f"Score {g['score']} out of [0,1]"
+    def test_score_not_always_zero(self):
+        g = run_full_episode(task_id=1, seed=2, hvac=0.2)
+        assert g["score"] > 0.01, "Low HVAC policy should score > 0"
+    def test_score_not_always_one(self):
+        g = run_full_episode(task_id=1, seed=3, hvac=1.0)
+        assert g["score"] < 0.999, "Always-on policy should not score 1.0"
+    def test_deterministic(self):
+        g1 = run_full_episode(task_id=1, seed=42)
+        g2 = run_full_episode(task_id=1, seed=42)
+        assert abs(g1["score"] - g2["score"]) < 1e-6, "Grader not deterministic with same seed"
+    def test_sub_scores_present(self):
+        g = run_full_episode(task_id=1, seed=5)
+        assert "cost" in g["sub_scores"], "Task 1 grade missing 'cost' sub-score"
+    def test_exploit_shedding_penalized(self):
+        """Always shedding 50% should be detected and penalized."""
+        reset(task_id=1, seed=10)
+        action = {"hvac_power_level": 0.5, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0.5}
+        for _ in range(96):
+            step(action)
+        g = grade()
+        # Score should be reduced OR exploit flagged
+        assert g["exploit_detected"] or g["score"] < 0.9
+# ── Task 2 ──────────────────────────────────────────────────────────────────
+class TestTask2:
+    def test_score_in_range(self):
+        g = run_full_episode(task_id=2, seed=20)
+        assert 0.0 <= g["score"] <= 1.0
+    def test_has_temp_sub_score(self):
+        g = run_full_episode(task_id=2, seed=21)
+        assert "temperature" in g["sub_scores"]
+    def test_temp_score_range(self):
+        g = run_full_episode(task_id=2, seed=22)
+        ts = g["sub_scores"].get("temperature", -1)
+        assert 0.0 <= ts <= 1.0, f"Temperature sub-score {ts} out of [0,1]"
+    def test_weights_sum_correct(self):
+        """Task 2 score = 0.6*cost + 0.4*temp."""
+        g = run_full_episode(task_id=2, seed=23)
+        expected = g["sub_scores"]["cost"] * 0.6 + g["sub_scores"]["temperature"] * 0.4
+        assert abs(g["score"] - expected) < 0.01 or g["exploit_detected"]
+    def test_score_varies_with_policy(self):
+        g_low  = run_full_episode(task_id=2, seed=24, hvac=0.1)
+        g_high = run_full_episode(task_id=2, seed=24, hvac=0.9)
+        # Scores should differ (policy matters)
+        assert abs(g_low["score"] - g_high["score"]) > 0.001
+# ── Task 3 ──────────────────────────────────────────────────────────────────
+class TestTask3:
+    def test_score_in_range(self):
+        g = run_full_episode(task_id=3, seed=30)
+        assert 0.0 <= g["score"] <= 1.0
+    def test_has_all_sub_scores(self):
+        g = run_full_episode(task_id=3, seed=31)
+        for key in ["cost", "temperature", "grid_response", "batch_deadline"]:
+            assert key in g["sub_scores"], f"Missing sub-score: {key}"
+    def test_all_sub_scores_in_range(self):
+        g = run_full_episode(task_id=3, seed=32)
+        for key, val in g["sub_scores"].items():
+            assert 0.0 <= val <= 1.0, f"Sub-score '{key}' = {val} out of [0,1]"
+    def test_weights_sum_correct(self):
+        g = run_full_episode(task_id=3, seed=33)
+        ss = g["sub_scores"]
+        expected = ss["cost"]*0.35 + ss["temperature"]*0.25 + ss["grid_response"]*0.25 + ss["batch_deadline"]*0.15
+        assert abs(g["score"] - expected) < 0.01 or g["exploit_detected"]
+    def test_grid_response_sub_score(self):
+        g = run_full_episode(task_id=3, seed=34)
+        gs = g["sub_scores"].get("grid_response", -1)
+        assert 0.0 <= gs <= 1.0, f"grid_response={gs} out of [0,1]"
+    def test_batch_deadline_sub_score(self):
+        g = run_full_episode(task_id=3, seed=35)
+        bd = g["sub_scores"].get("batch_deadline", -1)
+        assert 0.0 <= bd <= 1.0
+# ── Multi-building grading ────────────────────────────────────────────────────
+class TestMultiBuilding:
+    def test_2_building_grade(self):
+        requests.post(f"{BASE}/reset", json={"task_id": 1, "seed": 50, "num_buildings": 2}).raise_for_status()
+        action = [
+            {"hvac_power_level": 0.4, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0, "building_id": 0},
+            {"hvac_power_level": 0.6, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0, "building_id": 1},
+        ]
+        for _ in range(96):
+            r = requests.post(f"{BASE}/step", json=action)
+            if r.json()[0].get("done"):
+                break
+        g = grade()
+        assert 0.0 <= g["score"] <= 1.0