adityss commited on
Commit
1875b13
·
0 Parent(s):

Initial commit

Browse files
Dockerfile ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Stage 1: Build the Go environment server
2
+ FROM golang:1.21-alpine AS builder
3
+
4
+ WORKDIR /app
5
+ COPY go.mod go.sum* ./
6
+ RUN go mod download || true
7
+
8
+ COPY main.go ./
9
+ COPY env/ ./env/
10
+ RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o gridmind-server main.go
11
+
12
+ # Stage 2: Final image with Python runtime and Dashboard
13
+ FROM python:3.11-slim
14
+
15
+ WORKDIR /app
16
+
17
+ # Install supervisor to run both servers
18
+ RUN apt-get update && apt-get install -y supervisor && rm -rf /var/lib/apt/lists/*
19
+
20
+ # Install Python dependencies
21
+ COPY python/requirements.txt ./requirements.txt
22
+ RUN pip install --no-cache-dir -r requirements.txt || \
23
+ pip install --no-cache-dir openai fastapi uvicorn httpx pydantic "pydantic>=2.0.0" requests
24
+
25
+ # Copy Go binary
26
+ COPY --from=builder /app/gridmind-server /usr/local/bin/gridmind-server
27
+
28
+ # Copy Python layer and Dashboard
29
+ COPY python/ ./python/
30
+ COPY dashboard/ ./dashboard/
31
+ COPY data/ ./data/
32
+ COPY openenv.yaml ./
33
+
34
+ # Configure Supervisor
35
+ RUN echo "[supervisord]" > /etc/supervisor/conf.d/supervisord.conf && \
36
+ echo "nodaemon=true" >> /etc/supervisor/conf.d/supervisord.conf && \
37
+ echo "[program:go-env]" >> /etc/supervisor/conf.d/supervisord.conf && \
38
+ echo "command=/usr/local/bin/gridmind-server" >> /etc/supervisor/conf.d/supervisord.conf && \
39
+ echo "environment=PORT=7860" >> /etc/supervisor/conf.d/supervisord.conf && \
40
+ echo "stdout_logfile=/dev/stdout" >> /etc/supervisor/conf.d/supervisord.conf && \
41
+ echo "stdout_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
42
+ echo "stderr_logfile=/dev/stderr" >> /etc/supervisor/conf.d/supervisord.conf && \
43
+ echo "stderr_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
44
+ echo "[program:dashboard]" >> /etc/supervisor/conf.d/supervisord.conf && \
45
+ echo "command=python -m uvicorn dashboard.server:app --host 0.0.0.0 --port 7861" >> /etc/supervisor/conf.d/supervisord.conf && \
46
+ echo "stdout_logfile=/dev/stdout" >> /etc/supervisor/conf.d/supervisord.conf && \
47
+ echo "stdout_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf && \
48
+ echo "stderr_logfile=/dev/stderr" >> /etc/supervisor/conf.d/supervisord.conf && \
49
+ echo "stderr_logfile_maxbytes=0" >> /etc/supervisor/conf.d/supervisord.conf
50
+
51
+ # Add a non-root user (good practice and required for some HF Spaces configs)
52
+ RUN useradd -m -u 1000 user
53
+ RUN chown -R user:user /app
54
+ USER user
55
+
56
+ # 7860 = Env Server (main OpenEnv endpoint), 7861 = Dashboard
57
+ EXPOSE 7860 7861
58
+
59
+ CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]
README.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GridMind-RL
2
+
3
+ GridMind-RL is an OpenEnv-compliant reinforcement learning environment simulating a commercial/industrial building energy management system.
4
+
5
+ An RL agent acts as the energy controller, shaping electrical load profiles by adjusting HVAC setpoints, managing thermal storage, and scheduling batch processes. The goal is to optimize operations in response to real-time electricity prices, grid carbon intensity, and utility demand-response signals.
6
+
7
+ ## Architecture
8
+
9
+ ```text
10
+ ┌──────────────────────┐ ┌─────────────────────────────┐
11
+ │ │ │ │
12
+ │ LLM RL Agent │◄───────┤ GridMind-RL Server │
13
+ │ (Inference Script) │ POST │ (Go OpenEnv Backend) │
14
+ │ ├───────►│ Port 7860 │
15
+ └──────────────────────┘ Action │ │
16
+ └──────────────┬──────────────┘
17
+ State │
18
+ Polling │
19
+ ┌──────────────▼──────────────┐
20
+ │ │
21
+ │ Visualization UI │
22
+ │ (FastAPI + HTML/JS) │
23
+ │ Port 7861 │
24
+ └─────────────────────────────┘
25
+ ```
26
+
27
+ ## Observation Space
28
+
29
+ | Name | Type | Range | Description |
30
+ |------|------|-------|-------------|
31
+ | `indoor_temperature` | float | [15.0, 30.0] | Current indoor temperature (°C). Goal is usually 21°C. |
32
+ | `thermal_storage_level` | float | [0.0, 1.0] | Thermal storage capacity fill level. |
33
+ | `process_demand` | float | [0.0, 50.0] | Current uncontrolled process power demand (kW). |
34
+ | `current_price` | float | [0.02, 0.50] | Real-time electricity price ($/kWh). |
35
+ | `grid_stress_signal` | float | [0.0, 1.0] | Utility signal indicating grid stress. >0.7 requires shedding. |
36
+ | `carbon_intensity` | float | [100, 700] | Grid carbon emissions intensity (gCO2/kWh). |
37
+ | `hour_of_day` | int | [0, 23] | Current hour, useful for scheduling. |
38
+ | `batch_queue` | list[int] | N/A | List of deadline slots for pending batch jobs. |
39
+ | `cumulative_cost` | float | [0.0, inf) | Running energy cost in $. |
40
+ | `step` | int | [0, 95] | Current episode timestep (15-min intervals over 24h). |
41
+ | `building_id` | int | [0, 2] | ID of the building in multi-building federated mode. |
42
+
43
+ ## Action Space
44
+
45
+ | Name | Type | Range | Description |
46
+ |------|------|-------|-------------|
47
+ | `hvac_power_level` | float | [0.0, 1.0] | Fraction of max HVAC cooling/heating power to apply. |
48
+ | `thermal_charge_rate` | float | [-1.0, 1.0] | Charge (positive) or discharge (negative) thermal storage. |
49
+ | `batch_job_slot` | int | [0, 4] | Delay scheduling the next batch job by 0-4 time slots. |
50
+ | `load_shed_fraction` | float | [0.0, 0.5] | Fraction of non-critical load to shed (max 50%). |
51
+ | `building_id` | int | [0, 2] | Select which building to apply this action to (federation). |
52
+
53
+ ## Tasks
54
+
55
+ GridMind-RL features 3 progressively difficult tasks:
56
+
57
+ 1. **Task 1: Cost Minimization (Easy)**
58
+ Minimize total energy costs by moving load to off-peak periods using thermal storage. No temperature constraints.
59
+ 2. **Task 2: Temperature Management (Medium)**
60
+ Minimize costs while keeping indoor temperatures strictly within 19°C – 23°C.
61
+ 3. **Task 3: Full Demand Response (Hard)**
62
+ Minimize cost, maintain temperature, successfully schedule batch jobs before deadlines, and shed loads when the grid stress signal exceeds 0.7.
63
+
64
+ ## Reward Function
65
+
66
+ The dense reward includes several components:
67
+ * **Cost Savings:** Proportional to energy savings vs the baseline flat tariff policy.
68
+ * **Temp Constraint:** Gaussian bonus for being close to the setpoint, harsh penalty for exiting [19°C, 23°C].
69
+ * **Grid Response:** Large bonus if `load_shed_fraction` > 0 when `grid_stress_signal` > 0.7.
70
+ * **Deadline Penalty:** Heavy negative reward for jobs that execute past their deadline slot.
71
+ * **Efficiency Bonus:** Rewards charging thermal storage when the current price is *below* the future moving average.
72
+ * **Stability Penalty:** Penalizes rapid oscillation of the HVAC and storage controls.
73
+
74
+ *Exploit Detection:* The grader detects degenerate strategies (e.g. permanently shedding 40% load) and applies up to a 30% score penalty.
75
+
76
+ ## Usage
77
+
78
+ ### Local Docker Build
79
+
80
+ ```bash
81
+ docker build -t gridmind-rl .
82
+ docker run -p 7860:7860 -p 7861:7861 gridmind-rl
83
+ ```
84
+
85
+ * Backend OpenEnv server: http://localhost:7860
86
+ * Visualization Dashboard: http://localhost:7861
87
+
88
+ ### Validating the Environment
89
+
90
+ ```bash
91
+ python python/validate.py --env-url http://localhost:7860
92
+ ```
93
+
94
+ ### Running Baseline Inference
95
+
96
+ ```bash
97
+ export API_BASE_URL=https://api-inference.huggingface.co/v1
98
+ export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
99
+ export HF_TOKEN=your_token
100
+
101
+ # Install dependencies
102
+ pip install -r python/requirements.txt
103
+
104
+ # Run inference
105
+ python python/inference.py --episodes 3
106
+ ```
107
+
108
+ ## Extensions
109
+ * **Multi-building mode:** Switch the environment to 3 buildings via `POST /reset {"num_buildings": 3}` and output action arrays for coordinated dispatch.
110
+ * **Add new tasks:** Edit `env/tasks.go` and implement a new `gradeTaskX` component.
dashboard/server.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GridMind-RL Visualization Dashboard (FastAPI)
3
+ Serves the real-time simulation dashboard at port 7861.
4
+ Proxies /api/* to the Go environment server at port 7860.
5
+ """
6
+
7
+ import os
8
+ from pathlib import Path
9
+
10
+ import httpx
11
+ from fastapi import FastAPI, Request
12
+ from fastapi.responses import HTMLResponse, JSONResponse
13
+ from fastapi.staticfiles import StaticFiles
14
+
15
+ app = FastAPI(title="GridMind-RL Dashboard", version="1.0.0")
16
+
17
+ ENV_SERVER = os.getenv("ENV_SERVER_URL", "http://localhost:7860")
18
+ STATIC_DIR = Path(__file__).parent / "static"
19
+
20
+ # Serve static files (index.html, dashboard.js)
21
+ app.mount("/static", StaticFiles(directory=str(STATIC_DIR)), name="static")
22
+
23
+
24
+ @app.get("/", response_class=HTMLResponse)
25
+ async def index():
26
+ """Serve the dashboard HTML."""
27
+ with open(STATIC_DIR / "index.html", "r", encoding="utf-8") as f:
28
+ return HTMLResponse(content=f.read())
29
+
30
+
31
+ @app.get("/api/{path:path}")
32
+ async def proxy_get(path: str, request: Request):
33
+ """Proxy GET requests to the Go environment server."""
34
+ url = f"{ENV_SERVER}/{path}"
35
+ params = dict(request.query_params)
36
+ async with httpx.AsyncClient(timeout=10.0) as client:
37
+ try:
38
+ r = await client.get(url, params=params)
39
+ return JSONResponse(content=r.json(), status_code=r.status_code)
40
+ except Exception as e:
41
+ return JSONResponse({"error": str(e)}, status_code=503)
42
+
43
+
44
+ @app.post("/api/{path:path}")
45
+ async def proxy_post(path: str, request: Request):
46
+ """Proxy POST requests to the Go environment server."""
47
+ url = f"{ENV_SERVER}/{path}"
48
+ body = await request.body()
49
+ async with httpx.AsyncClient(timeout=10.0) as client:
50
+ try:
51
+ r = await client.post(url, content=body,
52
+ headers={"Content-Type": "application/json"})
53
+ return JSONResponse(content=r.json(), status_code=r.status_code)
54
+ except Exception as e:
55
+ return JSONResponse({"error": str(e)}, status_code=503)
56
+
57
+
58
+ @app.get("/health")
59
+ async def health():
60
+ return {"status": "ok", "dashboard": "gridmind-rl"}
61
+
62
+
63
+ if __name__ == "__main__":
64
+ import uvicorn
65
+ port = int(os.getenv("DASHBOARD_PORT", "7861"))
66
+ uvicorn.run(app, host="0.0.0.0", port=port, log_level="info")
dashboard/static/dashboard.js ADDED
@@ -0,0 +1,617 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * GridMind-RL Dashboard — Chart.js real-time visualization
3
+ * Polls /api/state every 500ms and updates all charts.
4
+ */
5
+
6
+ 'use strict';
7
+
8
+ // ── Config ──────────────────────────────────────────────────────────────────
9
+ const POLL_MS = 500;
10
+ const HISTORY_LEN = 96; // 96 steps = full episode
11
+ const API_BASE = '/api';
12
+ const TASK_NAMES = {
13
+ 1: 'Task 1 — Cost Minimization (Easy)',
14
+ 2: 'Task 2 — Temperature Management (Medium)',
15
+ 3: 'Task 3 — Full Demand Response (Hard)',
16
+ };
17
+
18
+ let currentBuilding = 0;
19
+ let pollTimer = null;
20
+ let connected = false;
21
+
22
+ // ── Chart.js global defaults ─────────────────────────────────────────────────
23
+ Chart.defaults.color = '#8899b4';
24
+ Chart.defaults.borderColor = 'rgba(56,139,253,0.1)';
25
+ Chart.defaults.font.family = "'Inter', sans-serif";
26
+ Chart.defaults.font.size = 11;
27
+ Chart.defaults.plugins.legend.display = false;
28
+ Chart.defaults.animation.duration = 300;
29
+
30
+ const COLORS = {
31
+ blue: '#388bfd',
32
+ green: '#3fb950',
33
+ amber: '#d29922',
34
+ red: '#f85149',
35
+ purple: '#bc8cff',
36
+ cyan: '#39d0d8',
37
+ orange: '#ff7c39',
38
+ dimBlue: 'rgba(56,139,253,0.15)',
39
+ };
40
+
41
+ function rgba(hex, alpha) {
42
+ const r = parseInt(hex.slice(1,3), 16);
43
+ const g = parseInt(hex.slice(3,5), 16);
44
+ const b = parseInt(hex.slice(5,7), 16);
45
+ return `rgba(${r},${g},${b},${alpha})`;
46
+ }
47
+
48
+ // ── Chart factory helpers ────────────────────────────────────────────────────
49
+ function makeLineChart(id, labels, datasets, opts = {}) {
50
+ const ctx = document.getElementById(id).getContext('2d');
51
+ return new Chart(ctx, {
52
+ type: 'line',
53
+ data: { labels, datasets },
54
+ options: {
55
+ responsive: true,
56
+ maintainAspectRatio: false,
57
+ interaction: { mode: 'index', intersect: false },
58
+ scales: {
59
+ x: { grid: { color: 'rgba(56,139,253,0.06)' }, ticks: { maxTicksLimit: 8 } },
60
+ y: { grid: { color: 'rgba(56,139,253,0.06)' }, ...opts.yAxis },
61
+ },
62
+ plugins: {
63
+ legend: { display: opts.legend || false },
64
+ tooltip: { backgroundColor: '#0f1829', borderColor: 'rgba(56,139,253,0.3)', borderWidth: 1 },
65
+ },
66
+ ...opts.extra,
67
+ },
68
+ });
69
+ }
70
+
71
+ function makeAreaChart(id, labels, datasets) {
72
+ return makeLineChart(id, labels, datasets, {
73
+ extra: { fill: true },
74
+ });
75
+ }
76
+
77
+ function makeBarChart(id, labels, datasets) {
78
+ const ctx = document.getElementById(id).getContext('2d');
79
+ return new Chart(ctx, {
80
+ type: 'bar',
81
+ data: { labels, datasets },
82
+ options: {
83
+ responsive: true,
84
+ maintainAspectRatio: false,
85
+ scales: {
86
+ x: { stacked: true, grid: { color: 'rgba(56,139,253,0.06)' }, ticks: { maxTicksLimit: 8 } },
87
+ y: { stacked: true, grid: { color: 'rgba(56,139,253,0.06)' } },
88
+ },
89
+ plugins: {
90
+ legend: { display: true, position: 'bottom', labels: { usePointStyle: true, padding: 10 } },
91
+ tooltip: { backgroundColor: '#0f1829', borderColor: 'rgba(56,139,253,0.3)', borderWidth: 1 },
92
+ },
93
+ },
94
+ });
95
+ }
96
+
97
+ // ── Initialise all charts ─────────────────────────────────────────────────────
98
+ const emptyLabels = Array.from({ length: 24 }, (_, i) => `${i}h`);
99
+ const emptyData = Array(24).fill(null);
100
+
101
+ // 1. Price curve
102
+ const priceChart = makeLineChart('chart-price',
103
+ emptyLabels,
104
+ [
105
+ {
106
+ label: 'Price ($/kWh)',
107
+ data: [...emptyData],
108
+ borderColor: COLORS.amber,
109
+ backgroundColor: rgba(COLORS.amber, 0.15),
110
+ borderWidth: 2,
111
+ fill: true,
112
+ tension: 0.4,
113
+ pointRadius: 0,
114
+ },
115
+ {
116
+ label: 'Current',
117
+ data: [...emptyData],
118
+ borderColor: COLORS.red,
119
+ backgroundColor: 'transparent',
120
+ borderWidth: 0,
121
+ pointRadius: 6,
122
+ pointBackgroundColor: COLORS.red,
123
+ },
124
+ ],
125
+ { legend: true, yAxis: { title: { display: true, text: '$/kWh' } } }
126
+ );
127
+
128
+ // 2. Temperature
129
+ const tempChart = makeLineChart('chart-temp',
130
+ [],
131
+ [
132
+ {
133
+ label: 'Indoor Temp (°C)',
134
+ data: [],
135
+ borderColor: COLORS.cyan,
136
+ backgroundColor: rgba(COLORS.cyan, 0.1),
137
+ borderWidth: 2,
138
+ fill: true,
139
+ tension: 0.4,
140
+ pointRadius: 0,
141
+ },
142
+ {
143
+ label: 'T_max (23°C)',
144
+ data: [],
145
+ borderColor: rgba(COLORS.red, 0.5),
146
+ borderWidth: 1,
147
+ borderDash: [5, 5],
148
+ pointRadius: 0,
149
+ fill: false,
150
+ },
151
+ {
152
+ label: 'T_min (19°C)',
153
+ data: [],
154
+ borderColor: rgba(COLORS.blue, 0.5),
155
+ borderWidth: 1,
156
+ borderDash: [5, 5],
157
+ pointRadius: 0,
158
+ fill: false,
159
+ },
160
+ ],
161
+ { legend: true, yAxis: { suggestedMin: 15, suggestedMax: 30, title: { display: true, text: '°C' } } }
162
+ );
163
+
164
+ // 3. Storage history (mini)
165
+ const storageChart = makeLineChart('chart-storage',
166
+ [],
167
+ [{
168
+ label: 'Storage Level',
169
+ data: [],
170
+ borderColor: COLORS.cyan,
171
+ backgroundColor: rgba(COLORS.cyan, 0.2),
172
+ borderWidth: 2,
173
+ fill: true,
174
+ tension: 0.4,
175
+ pointRadius: 0,
176
+ }],
177
+ { yAxis: { min: 0, max: 1 } }
178
+ );
179
+
180
+ // 4. HVAC + Load Shed stacked area
181
+ const hvacChart = makeBarChart('chart-hvac',
182
+ [],
183
+ [
184
+ {
185
+ label: 'HVAC Power',
186
+ data: [],
187
+ backgroundColor: rgba(COLORS.blue, 0.7),
188
+ borderColor: COLORS.blue,
189
+ borderWidth: 1,
190
+ },
191
+ {
192
+ label: 'Load Shed',
193
+ data: [],
194
+ backgroundColor: rgba(COLORS.red, 0.7),
195
+ borderColor: COLORS.red,
196
+ borderWidth: 1,
197
+ },
198
+ ]
199
+ );
200
+
201
+ // 5. Cumulative cost vs baseline
202
+ const costChart = makeLineChart('chart-cost',
203
+ [],
204
+ [
205
+ {
206
+ label: 'Agent Cost ($)',
207
+ data: [],
208
+ borderColor: COLORS.green,
209
+ backgroundColor: rgba(COLORS.green, 0.1),
210
+ borderWidth: 2,
211
+ fill: true,
212
+ tension: 0.4,
213
+ pointRadius: 0,
214
+ },
215
+ {
216
+ label: 'Baseline ($)',
217
+ data: [],
218
+ borderColor: rgba(COLORS.amber, 0.7),
219
+ borderDash: [6, 3],
220
+ borderWidth: 2,
221
+ fill: false,
222
+ tension: 0.4,
223
+ pointRadius: 0,
224
+ },
225
+ ],
226
+ { legend: true, yAxis: { title: { display: true, text: '$' } } }
227
+ );
228
+
229
+ // 6. Grid stress history (mini)
230
+ const stressChart = makeLineChart('chart-stress',
231
+ [],
232
+ [{
233
+ label: 'Grid Stress',
234
+ data: [],
235
+ borderColor: COLORS.red,
236
+ backgroundColor: rgba(COLORS.red, 0.2),
237
+ borderWidth: 2,
238
+ fill: true,
239
+ tension: 0.4,
240
+ pointRadius: 0,
241
+ }],
242
+ { yAxis: { min: 0, max: 1 } }
243
+ );
244
+
245
+ // 7. Carbon curve
246
+ const carbonChart = makeLineChart('chart-carbon',
247
+ emptyLabels,
248
+ [{
249
+ label: 'Carbon Intensity (gCO₂/kWh)',
250
+ data: [...emptyData],
251
+ borderColor: COLORS.orange,
252
+ backgroundColor: rgba(COLORS.orange, 0.15),
253
+ borderWidth: 2,
254
+ fill: true,
255
+ tension: 0.4,
256
+ pointRadius: 0,
257
+ }],
258
+ { yAxis: { title: { display: true, text: 'gCO₂/kWh' } } }
259
+ );
260
+
261
+ // 8. Reward components bar
262
+ const rewardChart = makeBarChart('chart-reward',
263
+ [],
264
+ [
265
+ { label: 'Cost Savings', data: [], backgroundColor: rgba(COLORS.green, 0.8) },
266
+ { label: 'Temp Constraint',data: [], backgroundColor: rgba(COLORS.cyan, 0.8) },
267
+ { label: 'Grid Response', data: [], backgroundColor: rgba(COLORS.blue, 0.8) },
268
+ { label: 'Efficiency', data: [], backgroundColor: rgba(COLORS.purple, 0.7) },
269
+ { label: 'Penalties', data: [], backgroundColor: rgba(COLORS.red, 0.8) },
270
+ ]
271
+ );
272
+
273
+ // ── Stress meter bars ────────────────────────────────────────────────────────
274
+ function buildStressMeter() {
275
+ const el = document.getElementById('stress-meter');
276
+ el.innerHTML = '';
277
+ for (let i = 0; i < 20; i++) {
278
+ const bar = document.createElement('div');
279
+ bar.className = 'stress-bar';
280
+ bar.id = `sm-${i}`;
281
+ el.appendChild(bar);
282
+ }
283
+ }
284
+ buildStressMeter();
285
+
286
+ function updateStressMeter(stress) {
287
+ const bars = 20;
288
+ const active = Math.round(stress * bars);
289
+ for (let i = 0; i < bars; i++) {
290
+ const bar = document.getElementById(`sm-${i}`);
291
+ if (!bar) continue;
292
+ const pct = (i / bars) * 100;
293
+ bar.style.height = `${20 + pct * 0.8}%`;
294
+ if (i < active) {
295
+ const color = stress > 0.7 ? COLORS.red : stress > 0.4 ? COLORS.amber : COLORS.green;
296
+ bar.style.background = color;
297
+ bar.style.opacity = '1';
298
+ } else {
299
+ bar.style.background = 'rgba(255,255,255,0.05)';
300
+ bar.style.opacity = '1';
301
+ }
302
+ }
303
+ }
304
+
305
+ // ── Batch Gantt renderer ─────────────────────────────────────────────────────
306
+ function renderGantt(jobs, currentStep) {
307
+ const wrap = document.getElementById('gantt-wrap');
308
+ if (!jobs || jobs.length === 0) {
309
+ wrap.innerHTML = '<div style="color:var(--text-dim);font-size:0.8rem">No batch jobs in this episode.</div>';
310
+ return;
311
+ }
312
+ const totalSlots = 96;
313
+ wrap.innerHTML = '';
314
+ jobs.forEach(job => {
315
+ const row = document.createElement('div');
316
+ row.className = 'gantt-row';
317
+
318
+ const label = document.createElement('div');
319
+ label.className = 'gantt-label';
320
+ label.textContent = `J${job.id}`;
321
+ row.appendChild(label);
322
+
323
+ const track = document.createElement('div');
324
+ track.className = 'gantt-track';
325
+
326
+ // Deadline marker
327
+ const deadlinePct = (job.deadline_slot / totalSlots) * 100;
328
+ const deadline = document.createElement('div');
329
+ deadline.className = 'gantt-deadline';
330
+ deadline.style.left = `${deadlinePct}%`;
331
+ deadline.title = `Deadline: step ${job.deadline_slot}`;
332
+ track.appendChild(deadline);
333
+
334
+ // Job block
335
+ if (job.scheduled) {
336
+ const startPct = (job.scheduled_at / totalSlots) * 100;
337
+ const widthPct = (job.duration / totalSlots) * 100;
338
+ const block = document.createElement('div');
339
+ block.className = 'gantt-block ' + (job.completed ? 'completed' : job.missed_deadline ? 'missed' : 'scheduled');
340
+ block.style.left = `${startPct}%`;
341
+ block.style.width = `${Math.max(widthPct, 1)}%`;
342
+ track.appendChild(block);
343
+ }
344
+
345
+ // Current step marker
346
+ const curPct = (currentStep / totalSlots) * 100;
347
+ const curMarker = document.createElement('div');
348
+ curMarker.style.cssText = `position:absolute;top:0;bottom:0;width:1px;background:rgba(56,139,253,0.6);left:${curPct}%`;
349
+ track.appendChild(curMarker);
350
+
351
+ row.appendChild(track);
352
+
353
+ // Status badge
354
+ const statusWrap = document.createElement('div');
355
+ statusWrap.className = 'gantt-status';
356
+ let badgeClass = 'pending', badgeText = 'pending';
357
+ if (job.completed) { badgeClass = 'ok'; badgeText = 'done'; }
358
+ else if (job.missed_deadline) { badgeClass = 'missed'; badgeText = 'missed'; }
359
+ else if (job.scheduled && !job.completed) { badgeClass = 'running'; badgeText = 'running'; }
360
+ statusWrap.innerHTML = `<span class="badge ${badgeClass}">${badgeText}</span>`;
361
+ row.appendChild(statusWrap);
362
+
363
+ wrap.appendChild(row);
364
+ });
365
+ }
366
+
367
+ // ── Reward breakdown rows ─────────────────────────────────────────────────────
368
+ function renderRewardRows(rc) {
369
+ if (!rc) return;
370
+ const container = document.getElementById('reward-rows');
371
+ const components = [
372
+ { key: 'cost_savings', label: 'Cost Savings', color: COLORS.green, sign: 1 },
373
+ { key: 'temp_constraint', label: 'Temp Constr.', color: COLORS.cyan, sign: 1 },
374
+ { key: 'grid_response', label: 'Grid DR', color: COLORS.blue, sign: 1 },
375
+ { key: 'efficiency_bonus', label: 'Efficiency', color: COLORS.purple, sign: 1 },
376
+ { key: 'stability_penalty', label: 'Stability', color: COLORS.amber, sign: -1 },
377
+ { key: 'deadline_penalty', label: 'Deadlines', color: COLORS.red, sign: -1 },
378
+ { key: 'carbon_reward', label: 'Carbon', color: COLORS.orange, sign: 1 },
379
+ ];
380
+ container.innerHTML = '';
381
+ components.forEach(c => {
382
+ const val = rc[c.key] || 0;
383
+ const absVal = Math.abs(val);
384
+ const pct = Math.min(100, absVal * 30); // scale 0–~3 reward to 0–100%
385
+ container.innerHTML += `
386
+ <div class="reward-row">
387
+ <div class="reward-label">${c.label}</div>
388
+ <div class="reward-bar-wrap">
389
+ <div class="reward-bar" style="width:${pct}%;background:${c.color};opacity:0.8"></div>
390
+ </div>
391
+ <div class="reward-val" style="color:${val >= 0 ? COLORS.green : COLORS.red}">${val.toFixed(3)}</div>
392
+ </div>`;
393
+ });
394
+ }
395
+
396
+ // ── KPI color logic ──────────────────────────────────────────────────────────
397
+ function colorClass(val, good, bad) {
398
+ if (val <= good) return 'good';
399
+ if (val >= bad) return 'bad';
400
+ return 'warn';
401
+ }
402
+
403
+ // ── Main state update ─────────────────────────────────────────────────────────
404
+ let lastStep = -1;
405
+
406
+ async function fetchAndUpdate() {
407
+ try {
408
+ const res = await fetch(`${API_BASE}/state`);
409
+ if (!res.ok) throw new Error(`HTTP ${res.status}`);
410
+ const state = await res.json();
411
+ connected = true;
412
+ document.getElementById('conn-banner').classList.remove('show');
413
+ document.getElementById('status-dot').style.background = 'var(--accent-green)';
414
+
415
+ const b = state.buildings && state.buildings[currentBuilding];
416
+ if (!b) return;
417
+
418
+ const step = state.step;
419
+ const hourOfDay = b.hour_of_day || 0;
420
+
421
+ // ── Header ──
422
+ document.getElementById('ep-step').textContent = `ep:${state.episode} step:${step}/95`;
423
+ document.getElementById('task-badge').textContent = TASK_NAMES[state.task_id] || 'Task 1';
424
+
425
+ // ── KPIs ──
426
+ const priceEl = document.getElementById('kpi-price');
427
+ priceEl.textContent = `$${b.current_price.toFixed(4)}`;
428
+ priceEl.className = 'kpi-value ' + colorClass(b.current_price, 0.08, 0.16);
429
+
430
+ const tempEl = document.getElementById('kpi-temp');
431
+ tempEl.textContent = `${b.indoor_temperature.toFixed(1)}°C`;
432
+ const inBounds = b.indoor_temperature >= 19 && b.indoor_temperature <= 23;
433
+ tempEl.className = 'kpi-value ' + (inBounds ? 'good' : 'bad');
434
+
435
+ const stressEl = document.getElementById('kpi-stress');
436
+ stressEl.textContent = b.grid_stress_signal.toFixed(3);
437
+ stressEl.className = 'kpi-value ' + colorClass(b.grid_stress_signal, 0.4, 0.7);
438
+
439
+ const costEl = document.getElementById('kpi-cost');
440
+ const savings = b.baseline_cost - b.cumulative_cost;
441
+ costEl.textContent = `$${b.cumulative_cost.toFixed(2)}`;
442
+ costEl.className = 'kpi-value ' + (savings > 0 ? 'good' : 'warn');
443
+ document.getElementById('kpi-baseline').textContent = `$${b.baseline_cost.toFixed(2)}`;
444
+
445
+ document.getElementById('kpi-carbon').textContent = `${b.carbon_intensity.toFixed(0)}`;
446
+ document.getElementById('kpi-demand').textContent = `${b.process_demand.toFixed(1)}`;
447
+ document.getElementById('kpi-storage').textContent = `${(b.thermal_storage_level * 100).toFixed(1)}`;
448
+
449
+ // ── Price curve chart ──
450
+ if (state.price_curve_24h && state.price_curve_24h.length === 24) {
451
+ const labels = Array.from({ length: 24 }, (_, i) => `${i}:00`);
452
+ priceChart.data.labels = labels;
453
+ priceChart.data.datasets[0].data = state.price_curve_24h;
454
+ // Current position marker
455
+ const marker = Array(24).fill(null);
456
+ marker[hourOfDay] = state.price_curve_24h[hourOfDay];
457
+ priceChart.data.datasets[1].data = marker;
458
+ priceChart.update('none');
459
+ }
460
+
461
+ // ── Carbon curve ──
462
+ if (state.carbon_curve_24h && state.carbon_curve_24h.length === 24) {
463
+ carbonChart.data.labels = Array.from({ length: 24 }, (_, i) => `${i}:00`);
464
+ carbonChart.data.datasets[0].data = state.carbon_curve_24h;
465
+ carbonChart.update('none');
466
+ }
467
+
468
+ // ── Grid stress ──
469
+ document.getElementById('stress-big').textContent = b.grid_stress_signal.toFixed(3);
470
+ updateStressMeter(b.grid_stress_signal);
471
+ const cardStress = document.getElementById('card-stress');
472
+ if (b.grid_stress_signal > 0.7) {
473
+ cardStress.classList.add('alert-active');
474
+ } else {
475
+ cardStress.classList.remove('alert-active');
476
+ }
477
+
478
+ // ── Thermal storage bar ──
479
+ const storagePct = (b.thermal_storage_level * 100).toFixed(1);
480
+ document.getElementById('storage-pct').textContent = storagePct;
481
+ document.getElementById('storage-fill').style.width = `${storagePct}%`;
482
+
483
+ // ── History-based charts (only update when step changes) ──
484
+ if (step !== lastStep) {
485
+ lastStep = step;
486
+ const stepLabels = Array.from({ length: b.temp_history.length }, (_, i) => i);
487
+
488
+ // Temperature chart
489
+ if (b.temp_history.length > 0) {
490
+ tempChart.data.labels = stepLabels;
491
+ tempChart.data.datasets[0].data = b.temp_history;
492
+ tempChart.data.datasets[1].data = b.temp_history.map(() => 23);
493
+ tempChart.data.datasets[2].data = b.temp_history.map(() => 19);
494
+ tempChart.update('none');
495
+ }
496
+
497
+ // Storage history
498
+ if (b.hvac_history && b.hvac_history.length > 0) {
499
+ storageChart.data.labels = stepLabels;
500
+ storageChart.data.datasets[0].data = Array.from({ length: b.hvac_history.length }, (_, i) =>
501
+ b.thermal_storage_level // simplify: use current level as placeholder
502
+ );
503
+ storageChart.update('none');
504
+ }
505
+
506
+ // HVAC + load shed (bar)
507
+ if (b.hvac_history && b.load_shed_history) {
508
+ const n = Math.min(b.hvac_history.length, HISTORY_LEN);
509
+ hvacChart.data.labels = Array.from({ length: n }, (_, i) => i);
510
+ hvacChart.data.datasets[0].data = b.hvac_history.slice(0, n);
511
+ hvacChart.data.datasets[1].data = b.load_shed_history.slice(0, n);
512
+ hvacChart.update('none');
513
+ }
514
+
515
+ // Cost vs baseline
516
+ if (b.cost_history && b.cost_history.length > 0) {
517
+ const n = b.cost_history.length;
518
+ costChart.data.labels = Array.from({ length: n }, (_, i) => i);
519
+ costChart.data.datasets[0].data = b.cost_history;
520
+ // Generate approximate baseline curve (linear ramp to b.baseline_cost)
521
+ const baselineStep = b.baseline_cost / Math.max(step, 1);
522
+ costChart.data.datasets[1].data = b.cost_history.map((_, i) => baselineStep * (i + 1));
523
+ costChart.update('none');
524
+ }
525
+
526
+ // Grid stress history
527
+ if (b.reward_history && b.reward_history.length > 0) {
528
+ const n = b.reward_history.length;
529
+ stressChart.data.labels = Array.from({ length: n }, (_, i) => i);
530
+ stressChart.data.datasets[0].data = b.reward_history.map(r => Math.max(0, r.grid_response || 0));
531
+ stressChart.update('none');
532
+
533
+ // Reward breakdown chart (last 20 steps)
534
+ const recent = b.reward_history.slice(-20);
535
+ rewardChart.data.labels = Array.from({ length: recent.length }, (_, i) => n - recent.length + i);
536
+ rewardChart.data.datasets[0].data = recent.map(r => Math.max(0, r.cost_savings || 0));
537
+ rewardChart.data.datasets[1].data = recent.map(r => Math.max(0, r.temp_constraint || 0));
538
+ rewardChart.data.datasets[2].data = recent.map(r => Math.max(0, r.grid_response || 0));
539
+ rewardChart.data.datasets[3].data = recent.map(r => Math.max(0, r.efficiency_bonus || 0));
540
+ rewardChart.data.datasets[4].data = recent.map(r =>
541
+ Math.abs(r.deadline_penalty || 0) + Math.abs(r.stability_penalty || 0)
542
+ );
543
+ rewardChart.update('none');
544
+
545
+ // Reward rows (last step)
546
+ renderRewardRows(b.reward_history[b.reward_history.length - 1]);
547
+ }
548
+
549
+ // Batch Gantt
550
+ renderGantt(b.jobs || [], step);
551
+ }
552
+
553
+ } catch (err) {
554
+ connected = false;
555
+ document.getElementById('conn-banner').classList.add('show');
556
+ document.getElementById('status-dot').style.background = 'var(--accent-red)';
557
+ // console.error('Poll error:', err);
558
+ }
559
+ }
560
+
561
+ // ── Episode controls ─────────────────────────────────────────────────────────
562
+
563
+ async function doReset() {
564
+ const taskId = parseInt(document.getElementById('task-select').value, 10);
565
+ const btn = document.getElementById('btn-reset');
566
+ btn.textContent = 'Resetting...';
567
+ btn.disabled = true;
568
+ lastStep = -1;
569
+ try {
570
+ await fetch(`${API_BASE}/reset`, {
571
+ method: 'POST',
572
+ headers: { 'Content-Type': 'application/json' },
573
+ body: JSON.stringify({ task_id: taskId, num_buildings: 1 }),
574
+ });
575
+ } catch (e) {
576
+ console.error(e);
577
+ }
578
+ btn.textContent = '↺ New Episode';
579
+ btn.disabled = false;
580
+ document.getElementById('grade-result').textContent = '';
581
+ }
582
+
583
+ async function doGrade() {
584
+ try {
585
+ const res = await fetch(`${API_BASE}/grade`);
586
+ const grade = await res.json();
587
+ const score = (grade.score * 100).toFixed(2);
588
+ const el = document.getElementById('grade-result');
589
+ el.textContent = `Score: ${score}% ${grade.exploit_detected ? '⚠ exploit!' : ''}`;
590
+ el.style.color = grade.score > 0.6 ? 'var(--accent-green)' : grade.score > 0.3 ? 'var(--accent-amber)' : 'var(--accent-red)';
591
+ } catch (e) {
592
+ console.error(e);
593
+ }
594
+ }
595
+
596
+ function onTaskChange() {
597
+ // Reset chart histories on task change
598
+ [tempChart, storageChart, hvacChart, costChart, stressChart, rewardChart].forEach(c => {
599
+ c.data.labels = [];
600
+ c.data.datasets.forEach(d => d.data = []);
601
+ c.update('none');
602
+ });
603
+ }
604
+
605
+ function onBuildingChange() {
606
+ currentBuilding = parseInt(document.getElementById('building-select').value, 10);
607
+ lastStep = -1;
608
+ }
609
+
610
+ // ── Start polling ────────────────────────────────────────────────────────────
611
+ function startPolling() {
612
+ if (pollTimer) clearInterval(pollTimer);
613
+ fetchAndUpdate(); // immediate first fetch
614
+ pollTimer = setInterval(fetchAndUpdate, POLL_MS);
615
+ }
616
+
617
+ startPolling();
dashboard/static/index.html ADDED
@@ -0,0 +1,627 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8" />
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0" />
6
+ <title>GridMind-RL | Real-Time Energy Dashboard</title>
7
+ <meta name="description" content="Real-time visualization dashboard for the GridMind-RL Industrial Load-Shaping and Demand-Response RL environment." />
8
+ <!-- Chart.js CDN -->
9
+ <script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.0/dist/chart.umd.min.js"></script>
10
+ <!-- Google Fonts -->
11
+ <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet" />
12
+ <style>
13
+ /* ── Design System ─────────────────────────────────────────────── */
14
+ :root {
15
+ --bg-base: #0a0f1e;
16
+ --bg-surface: #0f1829;
17
+ --bg-card: #141f35;
18
+ --bg-card-h: #1a2840;
19
+ --border: rgba(56, 139, 253, 0.15);
20
+ --border-glow: rgba(56, 139, 253, 0.4);
21
+ --text-primary: #e2e8f4;
22
+ --text-secondary: #8899b4;
23
+ --text-dim: #4d6080;
24
+ --accent-blue: #388bfd;
25
+ --accent-green: #3fb950;
26
+ --accent-amber: #d29922;
27
+ --accent-red: #f85149;
28
+ --accent-purple: #bc8cff;
29
+ --accent-cyan: #39d0d8;
30
+ --accent-orange: #ff7c39;
31
+ --gradient-hero: linear-gradient(135deg, #0d1b33 0%, #0a0f1e 100%);
32
+ --glow-blue: 0 0 20px rgba(56,139,253,0.25), 0 0 40px rgba(56,139,253,0.1);
33
+ --glow-green: 0 0 20px rgba(63,185,80,0.25);
34
+ --font-mono: 'JetBrains Mono', monospace;
35
+ --radius: 12px;
36
+ --radius-lg: 16px;
37
+ }
38
+
39
+ *, *::before, *::after { box-sizing: border-box; margin: 0; padding: 0; }
40
+
41
+ html { scroll-behavior: smooth; }
42
+
43
+ body {
44
+ font-family: 'Inter', sans-serif;
45
+ background: var(--bg-base);
46
+ color: var(--text-primary);
47
+ min-height: 100vh;
48
+ overflow-x: hidden;
49
+ }
50
+
51
+ /* ── Animated background grid ── */
52
+ body::before {
53
+ content: '';
54
+ position: fixed;
55
+ inset: 0;
56
+ background-image:
57
+ linear-gradient(rgba(56,139,253,0.03) 1px, transparent 1px),
58
+ linear-gradient(90deg, rgba(56,139,253,0.03) 1px, transparent 1px);
59
+ background-size: 40px 40px;
60
+ pointer-events: none;
61
+ z-index: 0;
62
+ }
63
+
64
+ /* ── Header ─────────────────────────────────────────────────────── */
65
+ header {
66
+ position: sticky;
67
+ top: 0;
68
+ z-index: 100;
69
+ background: rgba(10,15,30,0.85);
70
+ backdrop-filter: blur(16px);
71
+ border-bottom: 1px solid var(--border);
72
+ padding: 0 2rem;
73
+ height: 64px;
74
+ display: flex;
75
+ align-items: center;
76
+ justify-content: space-between;
77
+ }
78
+
79
+ .logo {
80
+ display: flex;
81
+ align-items: center;
82
+ gap: 10px;
83
+ }
84
+ .logo-icon {
85
+ width: 32px; height: 32px;
86
+ background: linear-gradient(135deg, var(--accent-blue), var(--accent-cyan));
87
+ border-radius: 8px;
88
+ display: flex; align-items: center; justify-content: center;
89
+ font-size: 16px;
90
+ }
91
+ .logo-text { font-size: 1.1rem; font-weight: 700; letter-spacing: -0.3px; }
92
+ .logo-text span { color: var(--accent-blue); }
93
+
94
+ .header-status {
95
+ display: flex;
96
+ align-items: center;
97
+ gap: 1.5rem;
98
+ }
99
+ .status-dot {
100
+ width: 8px; height: 8px;
101
+ border-radius: 50%;
102
+ background: var(--accent-green);
103
+ box-shadow: 0 0 8px var(--accent-green);
104
+ animation: pulse 2s infinite;
105
+ }
106
+ @keyframes pulse {
107
+ 0%, 100% { opacity: 1; transform: scale(1); }
108
+ 50% { opacity: 0.6; transform: scale(0.9); }
109
+ }
110
+ .status-label { font-size: 0.8rem; color: var(--text-secondary); }
111
+
112
+ .task-badge {
113
+ padding: 4px 12px;
114
+ border-radius: 20px;
115
+ font-size: 0.75rem;
116
+ font-weight: 600;
117
+ background: rgba(56,139,253,0.15);
118
+ border: 1px solid var(--border);
119
+ color: var(--accent-blue);
120
+ }
121
+
122
+ /* ── KPI Bar ────────────────────────────────────────────────────── */
123
+ .kpi-bar {
124
+ position: relative; z-index: 1;
125
+ display: grid;
126
+ grid-template-columns: repeat(auto-fit, minmax(160px, 1fr));
127
+ gap: 1px;
128
+ background: var(--border);
129
+ border-bottom: 1px solid var(--border);
130
+ }
131
+ .kpi {
132
+ background: var(--bg-surface);
133
+ padding: 1rem 1.5rem;
134
+ display: flex; flex-direction: column; gap: 2px;
135
+ transition: background 0.2s;
136
+ }
137
+ .kpi:hover { background: var(--bg-card); }
138
+ .kpi-label { font-size: 0.7rem; text-transform: uppercase; letter-spacing: 0.8px; color: var(--text-dim); }
139
+ .kpi-value {
140
+ font-family: var(--font-mono);
141
+ font-size: 1.5rem;
142
+ font-weight: 600;
143
+ color: var(--text-primary);
144
+ transition: color 0.3s;
145
+ }
146
+ .kpi-value.good { color: var(--accent-green); }
147
+ .kpi-value.warn { color: var(--accent-amber); }
148
+ .kpi-value.bad { color: var(--accent-red); }
149
+ .kpi-delta { font-size: 0.72rem; color: var(--text-secondary); font-family: var(--font-mono); }
150
+
151
+ /* ── Main Grid ──────────────────────────────────────────────────── */
152
+ main {
153
+ position: relative; z-index: 1;
154
+ max-width: 1600px;
155
+ margin: 0 auto;
156
+ padding: 1.5rem;
157
+ display: grid;
158
+ grid-template-columns: repeat(12, 1fr);
159
+ gap: 1rem;
160
+ }
161
+
162
+ /* ── Card ───────────────────────────────────────────────────────── */
163
+ .card {
164
+ background: var(--bg-card);
165
+ border: 1px solid var(--border);
166
+ border-radius: var(--radius-lg);
167
+ padding: 1.25rem;
168
+ transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s;
169
+ position: relative;
170
+ overflow: hidden;
171
+ }
172
+ .card::before {
173
+ content: '';
174
+ position: absolute;
175
+ inset: 0;
176
+ background: linear-gradient(135deg, rgba(56,139,253,0.03) 0%, transparent 60%);
177
+ pointer-events: none;
178
+ }
179
+ .card:hover {
180
+ border-color: var(--border-glow);
181
+ box-shadow: var(--glow-blue);
182
+ transform: translateY(-1px);
183
+ }
184
+ .card.alert-active {
185
+ border-color: rgba(248,81,73,0.5);
186
+ box-shadow: 0 0 20px rgba(248,81,73,0.2);
187
+ animation: alertPulse 1.5s infinite;
188
+ }
189
+ @keyframes alertPulse {
190
+ 0%, 100% { box-shadow: 0 0 20px rgba(248,81,73,0.2); }
191
+ 50% { box-shadow: 0 0 35px rgba(248,81,73,0.4); }
192
+ }
193
+
194
+ .card-title {
195
+ font-size: 0.78rem;
196
+ font-weight: 600;
197
+ text-transform: uppercase;
198
+ letter-spacing: 0.8px;
199
+ color: var(--text-secondary);
200
+ margin-bottom: 0.75rem;
201
+ display: flex;
202
+ align-items: center;
203
+ gap: 0.5rem;
204
+ }
205
+ .card-title .icon { font-size: 0.9rem; }
206
+
207
+ /* ── Grid layout spans ─ */
208
+ .col-12 { grid-column: span 12; }
209
+ .col-8 { grid-column: span 8; }
210
+ .col-6 { grid-column: span 6; }
211
+ .col-4 { grid-column: span 4; }
212
+ .col-3 { grid-column: span 3; }
213
+
214
+ @media (max-width: 1200px) {
215
+ .col-8 { grid-column: span 12; }
216
+ .col-4 { grid-column: span 12; }
217
+ .col-6 { grid-column: span 12; }
218
+ .col-3 { grid-column: span 6; }
219
+ }
220
+ @media (max-width: 768px) {
221
+ .col-3 { grid-column: span 12; }
222
+ main { padding: 0.75rem; gap: 0.75rem; }
223
+ }
224
+
225
+ /* ── Charts ─────────────────────────────────────────────────────── */
226
+ .chart-wrap { position: relative; height: 200px; }
227
+ .chart-wrap.tall { height: 260px; }
228
+ .chart-wrap.short { height: 150px; }
229
+
230
+ /* ── Thermal Storage Bar ─────────────────────────────────────────── */
231
+ .storage-bar-wrap {
232
+ height: 28px;
233
+ background: rgba(255,255,255,0.05);
234
+ border-radius: 14px;
235
+ overflow: hidden;
236
+ margin-top: 0.5rem;
237
+ position: relative;
238
+ }
239
+ .storage-bar-fill {
240
+ height: 100%;
241
+ border-radius: 14px;
242
+ background: linear-gradient(90deg, var(--accent-cyan), var(--accent-blue));
243
+ transition: width 0.6s cubic-bezier(0.4, 0, 0.2, 1);
244
+ position: relative;
245
+ }
246
+ .storage-bar-fill::after {
247
+ content: '';
248
+ position: absolute;
249
+ inset: 0;
250
+ background: linear-gradient(90deg, transparent 0%, rgba(255,255,255,0.15) 50%, transparent 100%);
251
+ animation: shimmer 2s infinite;
252
+ }
253
+ @keyframes shimmer {
254
+ 0% { transform: translateX(-100%); }
255
+ 100% { transform: translateX(100%); }
256
+ }
257
+ .storage-label {
258
+ font-family: var(--font-mono);
259
+ font-size: 1.8rem;
260
+ font-weight: 700;
261
+ color: var(--accent-cyan);
262
+ margin-top: 0.5rem;
263
+ }
264
+ .storage-label span { font-size: 1rem; color: var(--text-secondary); }
265
+
266
+ /* ── Batch Gantt ─────────────────────────────────────────────────── */
267
+ .gantt-wrap {
268
+ display: flex;
269
+ flex-direction: column;
270
+ gap: 6px;
271
+ margin-top: 0.25rem;
272
+ }
273
+ .gantt-row {
274
+ display: flex;
275
+ align-items: center;
276
+ gap: 8px;
277
+ font-size: 0.75rem;
278
+ }
279
+ .gantt-label {
280
+ width: 40px;
281
+ color: var(--text-secondary);
282
+ font-family: var(--font-mono);
283
+ flex-shrink: 0;
284
+ }
285
+ .gantt-track {
286
+ flex: 1;
287
+ height: 18px;
288
+ background: rgba(255,255,255,0.05);
289
+ border-radius: 4px;
290
+ position: relative;
291
+ overflow: hidden;
292
+ }
293
+ .gantt-block {
294
+ position: absolute;
295
+ top: 0; bottom: 0;
296
+ border-radius: 4px;
297
+ transition: width 0.3s, left 0.3s;
298
+ }
299
+ .gantt-block.scheduled { background: linear-gradient(90deg, var(--accent-blue), var(--accent-purple)); }
300
+ .gantt-block.completed { background: var(--accent-green); opacity: 0.7; }
301
+ .gantt-block.missed { background: var(--accent-red); opacity: 0.8; }
302
+ .gantt-deadline {
303
+ position: absolute;
304
+ top: 0; bottom: 0;
305
+ width: 2px;
306
+ background: var(--accent-amber);
307
+ border-radius: 1px;
308
+ }
309
+ .gantt-status {
310
+ width: 60px;
311
+ text-align: right;
312
+ flex-shrink: 0;
313
+ }
314
+ .badge {
315
+ padding: 2px 8px;
316
+ border-radius: 10px;
317
+ font-size: 0.7rem;
318
+ font-weight: 600;
319
+ }
320
+ .badge.ok { background: rgba(63,185,80,0.2); color: var(--accent-green); }
321
+ .badge.pending { background: rgba(56,139,253,0.2); color: var(--accent-blue); }
322
+ .badge.missed { background: rgba(248,81,73,0.2); color: var(--accent-red); }
323
+ .badge.running { background: rgba(188,140,255,0.2); color: var(--accent-purple); }
324
+
325
+ /* ── Reward breakdown mini-bars ──────────────────────────────── */
326
+ .reward-row {
327
+ display: flex; align-items: center; gap: 8px;
328
+ font-size: 0.75rem; margin-bottom: 4px;
329
+ }
330
+ .reward-label { width: 100px; color: var(--text-secondary); }
331
+ .reward-bar-wrap { flex: 1; height: 10px; background: rgba(255,255,255,0.05); border-radius: 5px; overflow: hidden; }
332
+ .reward-bar { height: 100%; border-radius: 5px; transition: width 0.5s; }
333
+ .reward-val { width: 55px; text-align: right; font-family: var(--font-mono); color: var(--text-primary); }
334
+
335
+ /* ── Grid stress indicator ──────────────────────────────────── */
336
+ .stress-meter {
337
+ display: flex; align-items: flex-end; gap: 3px;
338
+ height: 40px;
339
+ margin-top: 0.5rem;
340
+ }
341
+ .stress-bar {
342
+ flex: 1;
343
+ background: rgba(255,255,255,0.05);
344
+ border-radius: 2px 2px 0 0;
345
+ transition: height 0.4s, background 0.4s;
346
+ }
347
+
348
+ /* ── Big number display ─────────────────────────────────────── */
349
+ .big-num {
350
+ font-family: var(--font-mono);
351
+ font-weight: 700;
352
+ }
353
+ .big-num.xl { font-size: 2.8rem; }
354
+ .big-num.lg { font-size: 2rem; }
355
+ .big-num.md { font-size: 1.4rem; }
356
+ .big-num.green { color: var(--accent-green); }
357
+ .big-num.blue { color: var(--accent-blue); }
358
+ .big-num.amber { color: var(--accent-amber); }
359
+ .big-num.red { color: var(--accent-red); }
360
+ .big-num.purple { color: var(--accent-purple); }
361
+
362
+ .sub-label { font-size: 0.75rem; color: var(--text-secondary); margin-top: 2px; }
363
+
364
+ /* ── Price ticker ──────────────────────────────────────────── */
365
+ .price-row {
366
+ display: flex; align-items: baseline; gap: 8px; flex-wrap: wrap;
367
+ }
368
+ .price-tier {
369
+ display: flex; align-items: center; gap: 6px;
370
+ padding: 6px 10px;
371
+ border-radius: 8px;
372
+ background: rgba(255,255,255,0.04);
373
+ border: 1px solid rgba(255,255,255,0.06);
374
+ font-size: 0.8rem;
375
+ }
376
+ .price-tier .dot { width: 8px; height: 8px; border-radius: 50%; }
377
+
378
+ /* ── Control panel ──────────────────────────────────────────── */
379
+ .ctrl-row {
380
+ display: flex; gap: 8px; align-items: center; flex-wrap: wrap;
381
+ margin-top: 0.5rem;
382
+ }
383
+ .btn {
384
+ padding: 8px 16px;
385
+ border-radius: 8px;
386
+ border: 1px solid var(--border);
387
+ background: rgba(56,139,253,0.1);
388
+ color: var(--accent-blue);
389
+ font-size: 0.82rem;
390
+ font-weight: 600;
391
+ cursor: pointer;
392
+ transition: all 0.2s;
393
+ font-family: 'Inter', sans-serif;
394
+ }
395
+ .btn:hover {
396
+ background: rgba(56,139,253,0.2);
397
+ border-color: var(--accent-blue);
398
+ box-shadow: 0 0 12px rgba(56,139,253,0.3);
399
+ transform: translateY(-1px);
400
+ }
401
+ .btn.primary {
402
+ background: var(--accent-blue);
403
+ color: #fff;
404
+ }
405
+ .btn.primary:hover { background: #4da3ff; }
406
+ .btn.danger {
407
+ background: rgba(248,81,73,0.15);
408
+ color: var(--accent-red);
409
+ border-color: rgba(248,81,73,0.3);
410
+ }
411
+
412
+ select {
413
+ padding: 8px 12px;
414
+ background: var(--bg-card);
415
+ border: 1px solid var(--border);
416
+ border-radius: 8px;
417
+ color: var(--text-primary);
418
+ font-size: 0.82rem;
419
+ font-family: 'Inter', sans-serif;
420
+ cursor: pointer;
421
+ }
422
+ select:focus { outline: none; border-color: var(--accent-blue); }
423
+
424
+ /* ── Connection banner ──────────────────────────────────────── */
425
+ #conn-banner {
426
+ display: none;
427
+ position: fixed;
428
+ top: 64px; left: 0; right: 0;
429
+ z-index: 200;
430
+ background: rgba(248,81,73,0.15);
431
+ border-bottom: 1px solid rgba(248,81,73,0.3);
432
+ text-align: center;
433
+ padding: 10px;
434
+ font-size: 0.85rem;
435
+ color: var(--accent-red);
436
+ }
437
+ #conn-banner.show { display: block; }
438
+
439
+ /* ── Footer ─────────────────────────────────────────────────── */
440
+ footer {
441
+ position: relative; z-index: 1;
442
+ text-align: center;
443
+ padding: 1.5rem;
444
+ color: var(--text-dim);
445
+ font-size: 0.75rem;
446
+ border-top: 1px solid var(--border);
447
+ }
448
+ </style>
449
+ </head>
450
+ <body>
451
+
452
+ <!-- Connection error banner -->
453
+ <div id="conn-banner">
454
+ ⚡ Environment server unreachable — retrying...
455
+ </div>
456
+
457
+ <!-- ── Header ─── -->
458
+ <header>
459
+ <div class="logo">
460
+ <div class="logo-icon">⚡</div>
461
+ <div class="logo-text">Grid<span>Mind</span>-RL</div>
462
+ </div>
463
+ <div class="header-status">
464
+ <span id="task-badge" class="task-badge">Task 1 — Cost Minimization</span>
465
+ <div style="display:flex;align-items:center;gap:6px">
466
+ <div class="status-dot" id="status-dot"></div>
467
+ <span class="status-label" id="status-label">Live</span>
468
+ </div>
469
+ <span id="ep-step" style="font-family:var(--font-mono);font-size:0.8rem;color:var(--text-secondary)">ep:— step:—</span>
470
+ </div>
471
+ </header>
472
+
473
+ <!-- ── KPI Bar ─── -->
474
+ <div class="kpi-bar">
475
+ <div class="kpi">
476
+ <span class="kpi-label">Current Price</span>
477
+ <span class="kpi-value" id="kpi-price">—</span>
478
+ <span class="kpi-delta">$/kWh</span>
479
+ </div>
480
+ <div class="kpi">
481
+ <span class="kpi-label">Indoor Temp</span>
482
+ <span class="kpi-value" id="kpi-temp">—</span>
483
+ <span class="kpi-delta">°C (target 21°C)</span>
484
+ </div>
485
+ <div class="kpi">
486
+ <span class="kpi-label">Grid Stress</span>
487
+ <span class="kpi-value" id="kpi-stress">—</span>
488
+ <span class="kpi-delta">0=normal 1=critical</span>
489
+ </div>
490
+ <div class="kpi">
491
+ <span class="kpi-label">Cumulative Cost</span>
492
+ <span class="kpi-value" id="kpi-cost">—</span>
493
+ <span class="kpi-delta">vs baseline: <span id="kpi-baseline">—</span></span>
494
+ </div>
495
+ <div class="kpi">
496
+ <span class="kpi-label">Carbon Intensity</span>
497
+ <span class="kpi-value" id="kpi-carbon">—</span>
498
+ <span class="kpi-delta">gCO₂/kWh</span>
499
+ </div>
500
+ <div class="kpi">
501
+ <span class="kpi-label">Process Demand</span>
502
+ <span class="kpi-value" id="kpi-demand">—</span>
503
+ <span class="kpi-delta">kW</span>
504
+ </div>
505
+ <div class="kpi">
506
+ <span class="kpi-label">Thermal Storage</span>
507
+ <span class="kpi-value" id="kpi-storage">—</span>
508
+ <span class="kpi-delta">% capacity</span>
509
+ </div>
510
+ </div>
511
+
512
+ <!-- ── Main Content ─── -->
513
+ <main>
514
+
515
+ <!-- Row 1: Price curve + Temperature + Controls -->
516
+ <div class="card col-8">
517
+ <div class="card-title"><span class="icon">💰</span> 24h Electricity Price Curve</div>
518
+ <div class="chart-wrap">
519
+ <canvas id="chart-price"></canvas>
520
+ </div>
521
+ </div>
522
+
523
+ <div class="card col-4" id="card-stress">
524
+ <div class="card-title"><span class="icon">⚠️</span> Grid Stress Signal</div>
525
+ <div class="big-num xl" id="stress-big">0.00</div>
526
+ <div class="sub-label">Demand-response urgency</div>
527
+ <div class="stress-meter" id="stress-meter"></div>
528
+ <div style="margin-top:0.75rem;">
529
+ <div class="chart-wrap short">
530
+ <canvas id="chart-stress"></canvas>
531
+ </div>
532
+ </div>
533
+ </div>
534
+
535
+ <!-- Row 2: Temperature + Storage + HVAC -->
536
+ <div class="card col-6">
537
+ <div class="card-title"><span class="icon">🌡️</span> Temperature Timeline</div>
538
+ <div class="chart-wrap tall">
539
+ <canvas id="chart-temp"></canvas>
540
+ </div>
541
+ </div>
542
+
543
+ <div class="card col-3">
544
+ <div class="card-title"><span class="icon">🔋</span> Thermal Storage Level</div>
545
+ <div class="storage-label"><span id="storage-pct">—</span><span>%</span></div>
546
+ <div class="storage-bar-wrap">
547
+ <div class="storage-bar-fill" id="storage-fill" style="width:0%"></div>
548
+ </div>
549
+ <div style="margin-top:1rem">
550
+ <div class="chart-wrap short">
551
+ <canvas id="chart-storage"></canvas>
552
+ </div>
553
+ </div>
554
+ </div>
555
+
556
+ <div class="card col-3">
557
+ <div class="card-title"><span class="icon">❄️</span> HVAC + Load Shed</div>
558
+ <div class="chart-wrap tall">
559
+ <canvas id="chart-hvac"></canvas>
560
+ </div>
561
+ </div>
562
+
563
+ <!-- Row 3: Cost comparison + Reward breakdown -->
564
+ <div class="card col-8">
565
+ <div class="card-title"><span class="icon">📊</span> Cumulative Cost vs Baseline</div>
566
+ <div class="chart-wrap tall">
567
+ <canvas id="chart-cost"></canvas>
568
+ </div>
569
+ </div>
570
+
571
+ <div class="card col-4">
572
+ <div class="card-title"><span class="icon">🏆</span> Reward Breakdown</div>
573
+ <div id="reward-rows" style="margin-top:0.5rem"></div>
574
+ <div style="margin-top:1rem">
575
+ <div class="chart-wrap short">
576
+ <canvas id="chart-reward"></canvas>
577
+ </div>
578
+ </div>
579
+ </div>
580
+
581
+ <!-- Row 4: Batch Gantt + Carbon -->
582
+ <div class="card col-6">
583
+ <div class="card-title"><span class="icon">⚙️</span> Batch Job Timeline</div>
584
+ <div class="gantt-wrap" id="gantt-wrap">
585
+ <div style="color:var(--text-dim);font-size:0.8rem">No batch jobs queued.</div>
586
+ </div>
587
+ </div>
588
+
589
+ <div class="card col-6">
590
+ <div class="card-title"><span class="icon">🌍</span> Carbon Intensity Curve (24h)</div>
591
+ <div class="chart-wrap">
592
+ <canvas id="chart-carbon"></canvas>
593
+ </div>
594
+ </div>
595
+
596
+ <!-- Row 5: Controls -->
597
+ <div class="card col-12">
598
+ <div class="card-title"><span class="icon">🎮</span> Episode Controls</div>
599
+ <div class="ctrl-row">
600
+ <select id="task-select" onchange="onTaskChange()">
601
+ <option value="1">Task 1 — Cost Minimization (Easy)</option>
602
+ <option value="2">Task 2 — Temperature Management (Medium)</option>
603
+ <option value="3">Task 3 — Full Demand Response (Hard)</option>
604
+ </select>
605
+ <select id="building-select" onchange="onBuildingChange()">
606
+ <option value="0">Building 1 (Primary)</option>
607
+ <option value="1">Building 2</option>
608
+ <option value="2">Building 3</option>
609
+ </select>
610
+ <button id="btn-reset" class="btn primary" onclick="doReset()">↺ New Episode</button>
611
+ <button class="btn" onclick="doGrade()">📋 Grade Episode</button>
612
+ <button class="btn" onclick="window.open('/api/replay')">📥 Export Replay</button>
613
+ <span id="grade-result" style="font-family:var(--font-mono);font-size:0.9rem;color:var(--accent-green)"></span>
614
+ </div>
615
+ </div>
616
+
617
+ </main>
618
+
619
+ <footer>
620
+ GridMind-RL &nbsp;·&nbsp; OpenEnv-compliant RL environment for industrial demand response &nbsp;·&nbsp;
621
+ <a href="/api/health" target="_blank" style="color:var(--accent-blue);text-decoration:none">API Health</a> &nbsp;·&nbsp;
622
+ <a href="/api/metrics" target="_blank" style="color:var(--accent-blue);text-decoration:none">Metrics</a>
623
+ </footer>
624
+
625
+ <script src="/static/dashboard.js"></script>
626
+ </body>
627
+ </html>
data/generate_prices.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Generate 30 days of realistic ISO New England-style hourly price data."""
2
+ import json
3
+ import math
4
+ import random
5
+
6
+ random.seed(12345)
7
+
8
+ def generate_day(day_type: str = "weekday") -> list[float]:
9
+ """Generate 24 hourly prices for one day."""
10
+ prices = []
11
+ for hour in range(24):
12
+ # Base time-of-use price
13
+ if hour < 6: # overnight off-peak
14
+ base = 0.035 + random.gauss(0, 0.005)
15
+ elif hour < 8: # morning ramp
16
+ base = 0.07 + random.gauss(0, 0.008)
17
+ elif hour < 12: # morning peak
18
+ base = 0.16 + random.gauss(0, 0.020) + (0.05 if day_type == "weekday" else 0)
19
+ elif hour < 14: # midday moderate
20
+ base = 0.10 + random.gauss(0, 0.012)
21
+ elif hour < 17: # afternoon
22
+ base = 0.12 + random.gauss(0, 0.015)
23
+ elif hour < 21: # evening super-peak
24
+ base = 0.22 + random.gauss(0, 0.025) + (0.08 if day_type == "weekday" else 0.02)
25
+ elif hour < 23: # evening wind-down
26
+ base = 0.09 + random.gauss(0, 0.010)
27
+ else: # late night
28
+ base = 0.04 + random.gauss(0, 0.006)
29
+
30
+ # Seasonal variation (summer = higher peaks)
31
+ season_mult = 1.0
32
+ prices.append(round(max(0.02, base * season_mult), 4))
33
+ return prices
34
+
35
+
36
+ # Generate 30 days: 22 weekdays + 8 weekend days
37
+ days = []
38
+ day_types = (["weekday"] * 5 + ["weekend"] * 2) * 5 # 5 weeks → 35 days → take 30
39
+ for i, dt in enumerate(day_types[:30]):
40
+ day_data = {
41
+ "day": i + 1,
42
+ "day_type": dt,
43
+ "prices_usd_per_kwh": generate_day(dt),
44
+ "source": "synthetic_iso_new_england_style",
45
+ "region": "ISONE",
46
+ "currency": "USD",
47
+ }
48
+ days.append(day_data)
49
+
50
+ output = {
51
+ "description": "30 days of synthetic ISO New England-style hourly electricity price data",
52
+ "units": "USD/kWh",
53
+ "hours_per_day": 24,
54
+ "days": days,
55
+ }
56
+
57
+ with open("price_curves.json", "w") as f:
58
+ json.dump(output, f, indent=2)
59
+
60
+ print(f"Generated {len(days)} days of price data -> data/price_curves.json")
data/price_curves.json ADDED
@@ -0,0 +1,997 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "description": "30 days of synthetic ISO New England-style hourly electricity price data",
3
+ "units": "USD/kWh",
4
+ "hours_per_day": 24,
5
+ "days": [
6
+ {
7
+ "day": 1,
8
+ "day_type": "weekday",
9
+ "prices_usd_per_kwh": [
10
+ 0.0344,
11
+ 0.0354,
12
+ 0.0369,
13
+ 0.0313,
14
+ 0.0328,
15
+ 0.0374,
16
+ 0.0657,
17
+ 0.0681,
18
+ 0.2251,
19
+ 0.225,
20
+ 0.1985,
21
+ 0.2053,
22
+ 0.0894,
23
+ 0.0963,
24
+ 0.1263,
25
+ 0.1183,
26
+ 0.1353,
27
+ 0.2966,
28
+ 0.2858,
29
+ 0.2996,
30
+ 0.2969,
31
+ 0.0836,
32
+ 0.0888,
33
+ 0.0411
34
+ ],
35
+ "source": "synthetic_iso_new_england_style",
36
+ "region": "ISONE",
37
+ "currency": "USD"
38
+ },
39
+ {
40
+ "day": 2,
41
+ "day_type": "weekday",
42
+ "prices_usd_per_kwh": [
43
+ 0.028,
44
+ 0.0461,
45
+ 0.0432,
46
+ 0.0339,
47
+ 0.0469,
48
+ 0.0353,
49
+ 0.0794,
50
+ 0.06,
51
+ 0.214,
52
+ 0.2183,
53
+ 0.1742,
54
+ 0.2313,
55
+ 0.0827,
56
+ 0.0908,
57
+ 0.122,
58
+ 0.1382,
59
+ 0.1129,
60
+ 0.2548,
61
+ 0.3064,
62
+ 0.283,
63
+ 0.3122,
64
+ 0.085,
65
+ 0.0941,
66
+ 0.0328
67
+ ],
68
+ "source": "synthetic_iso_new_england_style",
69
+ "region": "ISONE",
70
+ "currency": "USD"
71
+ },
72
+ {
73
+ "day": 3,
74
+ "day_type": "weekday",
75
+ "prices_usd_per_kwh": [
76
+ 0.0376,
77
+ 0.0411,
78
+ 0.0225,
79
+ 0.0336,
80
+ 0.0379,
81
+ 0.0358,
82
+ 0.085,
83
+ 0.0684,
84
+ 0.2185,
85
+ 0.2217,
86
+ 0.1987,
87
+ 0.2072,
88
+ 0.0983,
89
+ 0.1034,
90
+ 0.1186,
91
+ 0.1141,
92
+ 0.1537,
93
+ 0.2981,
94
+ 0.3104,
95
+ 0.2583,
96
+ 0.2746,
97
+ 0.1011,
98
+ 0.0908,
99
+ 0.0331
100
+ ],
101
+ "source": "synthetic_iso_new_england_style",
102
+ "region": "ISONE",
103
+ "currency": "USD"
104
+ },
105
+ {
106
+ "day": 4,
107
+ "day_type": "weekday",
108
+ "prices_usd_per_kwh": [
109
+ 0.0415,
110
+ 0.0376,
111
+ 0.0319,
112
+ 0.0411,
113
+ 0.0242,
114
+ 0.0409,
115
+ 0.0741,
116
+ 0.0831,
117
+ 0.2019,
118
+ 0.1858,
119
+ 0.2205,
120
+ 0.2158,
121
+ 0.1085,
122
+ 0.0982,
123
+ 0.1264,
124
+ 0.1306,
125
+ 0.1023,
126
+ 0.293,
127
+ 0.302,
128
+ 0.3,
129
+ 0.284,
130
+ 0.0902,
131
+ 0.076,
132
+ 0.0333
133
+ ],
134
+ "source": "synthetic_iso_new_england_style",
135
+ "region": "ISONE",
136
+ "currency": "USD"
137
+ },
138
+ {
139
+ "day": 5,
140
+ "day_type": "weekday",
141
+ "prices_usd_per_kwh": [
142
+ 0.0399,
143
+ 0.0384,
144
+ 0.0277,
145
+ 0.0399,
146
+ 0.04,
147
+ 0.0335,
148
+ 0.0707,
149
+ 0.0864,
150
+ 0.2277,
151
+ 0.2131,
152
+ 0.2001,
153
+ 0.2177,
154
+ 0.0938,
155
+ 0.0869,
156
+ 0.1201,
157
+ 0.1355,
158
+ 0.1153,
159
+ 0.2793,
160
+ 0.3253,
161
+ 0.3115,
162
+ 0.3118,
163
+ 0.1031,
164
+ 0.0916,
165
+ 0.0431
166
+ ],
167
+ "source": "synthetic_iso_new_england_style",
168
+ "region": "ISONE",
169
+ "currency": "USD"
170
+ },
171
+ {
172
+ "day": 6,
173
+ "day_type": "weekend",
174
+ "prices_usd_per_kwh": [
175
+ 0.0385,
176
+ 0.0322,
177
+ 0.0334,
178
+ 0.041,
179
+ 0.0324,
180
+ 0.028,
181
+ 0.0719,
182
+ 0.0641,
183
+ 0.1429,
184
+ 0.1684,
185
+ 0.134,
186
+ 0.1573,
187
+ 0.0969,
188
+ 0.0967,
189
+ 0.1037,
190
+ 0.137,
191
+ 0.1442,
192
+ 0.2491,
193
+ 0.2301,
194
+ 0.2167,
195
+ 0.2214,
196
+ 0.0925,
197
+ 0.0891,
198
+ 0.0332
199
+ ],
200
+ "source": "synthetic_iso_new_england_style",
201
+ "region": "ISONE",
202
+ "currency": "USD"
203
+ },
204
+ {
205
+ "day": 7,
206
+ "day_type": "weekend",
207
+ "prices_usd_per_kwh": [
208
+ 0.0349,
209
+ 0.0362,
210
+ 0.0287,
211
+ 0.0443,
212
+ 0.0319,
213
+ 0.0287,
214
+ 0.0718,
215
+ 0.0692,
216
+ 0.1497,
217
+ 0.1523,
218
+ 0.1499,
219
+ 0.1676,
220
+ 0.0849,
221
+ 0.121,
222
+ 0.1155,
223
+ 0.0913,
224
+ 0.1313,
225
+ 0.2497,
226
+ 0.2415,
227
+ 0.233,
228
+ 0.2438,
229
+ 0.0938,
230
+ 0.093,
231
+ 0.036
232
+ ],
233
+ "source": "synthetic_iso_new_england_style",
234
+ "region": "ISONE",
235
+ "currency": "USD"
236
+ },
237
+ {
238
+ "day": 8,
239
+ "day_type": "weekday",
240
+ "prices_usd_per_kwh": [
241
+ 0.0343,
242
+ 0.0354,
243
+ 0.0395,
244
+ 0.0389,
245
+ 0.0362,
246
+ 0.0306,
247
+ 0.069,
248
+ 0.0721,
249
+ 0.2192,
250
+ 0.2166,
251
+ 0.2171,
252
+ 0.235,
253
+ 0.0791,
254
+ 0.1257,
255
+ 0.1137,
256
+ 0.1489,
257
+ 0.1354,
258
+ 0.3167,
259
+ 0.2652,
260
+ 0.2888,
261
+ 0.2743,
262
+ 0.0965,
263
+ 0.091,
264
+ 0.0366
265
+ ],
266
+ "source": "synthetic_iso_new_england_style",
267
+ "region": "ISONE",
268
+ "currency": "USD"
269
+ },
270
+ {
271
+ "day": 9,
272
+ "day_type": "weekday",
273
+ "prices_usd_per_kwh": [
274
+ 0.0285,
275
+ 0.0337,
276
+ 0.0356,
277
+ 0.0321,
278
+ 0.034,
279
+ 0.0356,
280
+ 0.0616,
281
+ 0.0563,
282
+ 0.1877,
283
+ 0.2393,
284
+ 0.208,
285
+ 0.2069,
286
+ 0.0964,
287
+ 0.1007,
288
+ 0.1096,
289
+ 0.1448,
290
+ 0.1337,
291
+ 0.3229,
292
+ 0.3113,
293
+ 0.3042,
294
+ 0.3108,
295
+ 0.0887,
296
+ 0.1009,
297
+ 0.0526
298
+ ],
299
+ "source": "synthetic_iso_new_england_style",
300
+ "region": "ISONE",
301
+ "currency": "USD"
302
+ },
303
+ {
304
+ "day": 10,
305
+ "day_type": "weekday",
306
+ "prices_usd_per_kwh": [
307
+ 0.0337,
308
+ 0.0383,
309
+ 0.0398,
310
+ 0.0383,
311
+ 0.0383,
312
+ 0.0363,
313
+ 0.0723,
314
+ 0.083,
315
+ 0.224,
316
+ 0.2327,
317
+ 0.2133,
318
+ 0.2737,
319
+ 0.1272,
320
+ 0.0984,
321
+ 0.1354,
322
+ 0.1088,
323
+ 0.1394,
324
+ 0.285,
325
+ 0.3219,
326
+ 0.3342,
327
+ 0.3025,
328
+ 0.0886,
329
+ 0.0959,
330
+ 0.0499
331
+ ],
332
+ "source": "synthetic_iso_new_england_style",
333
+ "region": "ISONE",
334
+ "currency": "USD"
335
+ },
336
+ {
337
+ "day": 11,
338
+ "day_type": "weekday",
339
+ "prices_usd_per_kwh": [
340
+ 0.0357,
341
+ 0.0286,
342
+ 0.0378,
343
+ 0.0377,
344
+ 0.0363,
345
+ 0.049,
346
+ 0.0712,
347
+ 0.0776,
348
+ 0.199,
349
+ 0.1826,
350
+ 0.1832,
351
+ 0.199,
352
+ 0.1176,
353
+ 0.0801,
354
+ 0.1229,
355
+ 0.1223,
356
+ 0.1482,
357
+ 0.2716,
358
+ 0.2829,
359
+ 0.3186,
360
+ 0.2896,
361
+ 0.0804,
362
+ 0.0958,
363
+ 0.0429
364
+ ],
365
+ "source": "synthetic_iso_new_england_style",
366
+ "region": "ISONE",
367
+ "currency": "USD"
368
+ },
369
+ {
370
+ "day": 12,
371
+ "day_type": "weekday",
372
+ "prices_usd_per_kwh": [
373
+ 0.0329,
374
+ 0.0411,
375
+ 0.0388,
376
+ 0.0421,
377
+ 0.0398,
378
+ 0.0341,
379
+ 0.0597,
380
+ 0.0651,
381
+ 0.2184,
382
+ 0.1831,
383
+ 0.1782,
384
+ 0.2067,
385
+ 0.1281,
386
+ 0.1042,
387
+ 0.1386,
388
+ 0.1063,
389
+ 0.1263,
390
+ 0.3334,
391
+ 0.3248,
392
+ 0.3044,
393
+ 0.3069,
394
+ 0.1025,
395
+ 0.09,
396
+ 0.0426
397
+ ],
398
+ "source": "synthetic_iso_new_england_style",
399
+ "region": "ISONE",
400
+ "currency": "USD"
401
+ },
402
+ {
403
+ "day": 13,
404
+ "day_type": "weekend",
405
+ "prices_usd_per_kwh": [
406
+ 0.036,
407
+ 0.0313,
408
+ 0.0342,
409
+ 0.0347,
410
+ 0.0269,
411
+ 0.0316,
412
+ 0.0716,
413
+ 0.063,
414
+ 0.179,
415
+ 0.1798,
416
+ 0.1543,
417
+ 0.1754,
418
+ 0.103,
419
+ 0.1116,
420
+ 0.1137,
421
+ 0.1062,
422
+ 0.1053,
423
+ 0.2079,
424
+ 0.252,
425
+ 0.2098,
426
+ 0.2695,
427
+ 0.0893,
428
+ 0.0757,
429
+ 0.038
430
+ ],
431
+ "source": "synthetic_iso_new_england_style",
432
+ "region": "ISONE",
433
+ "currency": "USD"
434
+ },
435
+ {
436
+ "day": 14,
437
+ "day_type": "weekend",
438
+ "prices_usd_per_kwh": [
439
+ 0.0427,
440
+ 0.0336,
441
+ 0.0375,
442
+ 0.0335,
443
+ 0.0281,
444
+ 0.0387,
445
+ 0.0684,
446
+ 0.055,
447
+ 0.1582,
448
+ 0.1773,
449
+ 0.143,
450
+ 0.1704,
451
+ 0.11,
452
+ 0.1066,
453
+ 0.1046,
454
+ 0.1008,
455
+ 0.1226,
456
+ 0.2745,
457
+ 0.2401,
458
+ 0.2255,
459
+ 0.2547,
460
+ 0.0905,
461
+ 0.0804,
462
+ 0.028
463
+ ],
464
+ "source": "synthetic_iso_new_england_style",
465
+ "region": "ISONE",
466
+ "currency": "USD"
467
+ },
468
+ {
469
+ "day": 15,
470
+ "day_type": "weekday",
471
+ "prices_usd_per_kwh": [
472
+ 0.0317,
473
+ 0.0378,
474
+ 0.032,
475
+ 0.0319,
476
+ 0.0386,
477
+ 0.0359,
478
+ 0.0629,
479
+ 0.0648,
480
+ 0.1986,
481
+ 0.2264,
482
+ 0.2322,
483
+ 0.2359,
484
+ 0.0871,
485
+ 0.1131,
486
+ 0.0907,
487
+ 0.1067,
488
+ 0.1223,
489
+ 0.2872,
490
+ 0.2964,
491
+ 0.3011,
492
+ 0.3006,
493
+ 0.0828,
494
+ 0.0971,
495
+ 0.0394
496
+ ],
497
+ "source": "synthetic_iso_new_england_style",
498
+ "region": "ISONE",
499
+ "currency": "USD"
500
+ },
501
+ {
502
+ "day": 16,
503
+ "day_type": "weekday",
504
+ "prices_usd_per_kwh": [
505
+ 0.0292,
506
+ 0.0347,
507
+ 0.0381,
508
+ 0.0291,
509
+ 0.0382,
510
+ 0.032,
511
+ 0.0742,
512
+ 0.0729,
513
+ 0.24,
514
+ 0.219,
515
+ 0.1969,
516
+ 0.1791,
517
+ 0.1012,
518
+ 0.0939,
519
+ 0.1402,
520
+ 0.1286,
521
+ 0.1336,
522
+ 0.3002,
523
+ 0.269,
524
+ 0.281,
525
+ 0.2543,
526
+ 0.0738,
527
+ 0.0985,
528
+ 0.0392
529
+ ],
530
+ "source": "synthetic_iso_new_england_style",
531
+ "region": "ISONE",
532
+ "currency": "USD"
533
+ },
534
+ {
535
+ "day": 17,
536
+ "day_type": "weekday",
537
+ "prices_usd_per_kwh": [
538
+ 0.0346,
539
+ 0.036,
540
+ 0.0389,
541
+ 0.0387,
542
+ 0.0327,
543
+ 0.0322,
544
+ 0.0799,
545
+ 0.0771,
546
+ 0.1989,
547
+ 0.1892,
548
+ 0.2184,
549
+ 0.2203,
550
+ 0.0973,
551
+ 0.1233,
552
+ 0.1242,
553
+ 0.1179,
554
+ 0.1505,
555
+ 0.2786,
556
+ 0.3048,
557
+ 0.2824,
558
+ 0.3445,
559
+ 0.0863,
560
+ 0.0917,
561
+ 0.0412
562
+ ],
563
+ "source": "synthetic_iso_new_england_style",
564
+ "region": "ISONE",
565
+ "currency": "USD"
566
+ },
567
+ {
568
+ "day": 18,
569
+ "day_type": "weekday",
570
+ "prices_usd_per_kwh": [
571
+ 0.0303,
572
+ 0.0396,
573
+ 0.0473,
574
+ 0.0369,
575
+ 0.0322,
576
+ 0.0297,
577
+ 0.0662,
578
+ 0.0702,
579
+ 0.2095,
580
+ 0.1771,
581
+ 0.2459,
582
+ 0.2151,
583
+ 0.083,
584
+ 0.1221,
585
+ 0.1143,
586
+ 0.1287,
587
+ 0.1245,
588
+ 0.3212,
589
+ 0.3134,
590
+ 0.3064,
591
+ 0.2688,
592
+ 0.0922,
593
+ 0.085,
594
+ 0.0389
595
+ ],
596
+ "source": "synthetic_iso_new_england_style",
597
+ "region": "ISONE",
598
+ "currency": "USD"
599
+ },
600
+ {
601
+ "day": 19,
602
+ "day_type": "weekday",
603
+ "prices_usd_per_kwh": [
604
+ 0.0378,
605
+ 0.0354,
606
+ 0.0372,
607
+ 0.0334,
608
+ 0.0324,
609
+ 0.0292,
610
+ 0.0548,
611
+ 0.0589,
612
+ 0.2089,
613
+ 0.2411,
614
+ 0.1909,
615
+ 0.2079,
616
+ 0.0934,
617
+ 0.1047,
618
+ 0.1473,
619
+ 0.14,
620
+ 0.1142,
621
+ 0.2882,
622
+ 0.3052,
623
+ 0.2849,
624
+ 0.2826,
625
+ 0.0787,
626
+ 0.093,
627
+ 0.0341
628
+ ],
629
+ "source": "synthetic_iso_new_england_style",
630
+ "region": "ISONE",
631
+ "currency": "USD"
632
+ },
633
+ {
634
+ "day": 20,
635
+ "day_type": "weekend",
636
+ "prices_usd_per_kwh": [
637
+ 0.0427,
638
+ 0.0328,
639
+ 0.0326,
640
+ 0.0358,
641
+ 0.0349,
642
+ 0.0377,
643
+ 0.0556,
644
+ 0.0681,
645
+ 0.1586,
646
+ 0.1316,
647
+ 0.1455,
648
+ 0.1672,
649
+ 0.1059,
650
+ 0.0831,
651
+ 0.1101,
652
+ 0.1036,
653
+ 0.1251,
654
+ 0.2077,
655
+ 0.238,
656
+ 0.2641,
657
+ 0.2324,
658
+ 0.0977,
659
+ 0.1027,
660
+ 0.0344
661
+ ],
662
+ "source": "synthetic_iso_new_england_style",
663
+ "region": "ISONE",
664
+ "currency": "USD"
665
+ },
666
+ {
667
+ "day": 21,
668
+ "day_type": "weekend",
669
+ "prices_usd_per_kwh": [
670
+ 0.0309,
671
+ 0.0314,
672
+ 0.0327,
673
+ 0.0352,
674
+ 0.0284,
675
+ 0.0361,
676
+ 0.0644,
677
+ 0.0777,
678
+ 0.2042,
679
+ 0.1598,
680
+ 0.2028,
681
+ 0.1825,
682
+ 0.1061,
683
+ 0.0789,
684
+ 0.1235,
685
+ 0.1103,
686
+ 0.1345,
687
+ 0.2664,
688
+ 0.2455,
689
+ 0.2298,
690
+ 0.3003,
691
+ 0.0922,
692
+ 0.0966,
693
+ 0.0375
694
+ ],
695
+ "source": "synthetic_iso_new_england_style",
696
+ "region": "ISONE",
697
+ "currency": "USD"
698
+ },
699
+ {
700
+ "day": 22,
701
+ "day_type": "weekday",
702
+ "prices_usd_per_kwh": [
703
+ 0.037,
704
+ 0.0313,
705
+ 0.0354,
706
+ 0.0343,
707
+ 0.0322,
708
+ 0.0328,
709
+ 0.0688,
710
+ 0.0686,
711
+ 0.2045,
712
+ 0.2027,
713
+ 0.2029,
714
+ 0.2126,
715
+ 0.0948,
716
+ 0.1047,
717
+ 0.1121,
718
+ 0.1349,
719
+ 0.1085,
720
+ 0.2702,
721
+ 0.2516,
722
+ 0.2928,
723
+ 0.2882,
724
+ 0.093,
725
+ 0.0863,
726
+ 0.0348
727
+ ],
728
+ "source": "synthetic_iso_new_england_style",
729
+ "region": "ISONE",
730
+ "currency": "USD"
731
+ },
732
+ {
733
+ "day": 23,
734
+ "day_type": "weekday",
735
+ "prices_usd_per_kwh": [
736
+ 0.0396,
737
+ 0.033,
738
+ 0.0396,
739
+ 0.0354,
740
+ 0.0405,
741
+ 0.0336,
742
+ 0.0734,
743
+ 0.0631,
744
+ 0.2038,
745
+ 0.205,
746
+ 0.2128,
747
+ 0.219,
748
+ 0.0913,
749
+ 0.1061,
750
+ 0.0997,
751
+ 0.1123,
752
+ 0.1207,
753
+ 0.3197,
754
+ 0.3126,
755
+ 0.302,
756
+ 0.2606,
757
+ 0.1045,
758
+ 0.0939,
759
+ 0.0457
760
+ ],
761
+ "source": "synthetic_iso_new_england_style",
762
+ "region": "ISONE",
763
+ "currency": "USD"
764
+ },
765
+ {
766
+ "day": 24,
767
+ "day_type": "weekday",
768
+ "prices_usd_per_kwh": [
769
+ 0.0416,
770
+ 0.0355,
771
+ 0.0311,
772
+ 0.0371,
773
+ 0.0319,
774
+ 0.0375,
775
+ 0.0657,
776
+ 0.0728,
777
+ 0.1992,
778
+ 0.1648,
779
+ 0.2259,
780
+ 0.1994,
781
+ 0.1205,
782
+ 0.1015,
783
+ 0.1017,
784
+ 0.1245,
785
+ 0.1047,
786
+ 0.2621,
787
+ 0.2735,
788
+ 0.297,
789
+ 0.305,
790
+ 0.0922,
791
+ 0.0929,
792
+ 0.043
793
+ ],
794
+ "source": "synthetic_iso_new_england_style",
795
+ "region": "ISONE",
796
+ "currency": "USD"
797
+ },
798
+ {
799
+ "day": 25,
800
+ "day_type": "weekday",
801
+ "prices_usd_per_kwh": [
802
+ 0.041,
803
+ 0.0326,
804
+ 0.0393,
805
+ 0.0358,
806
+ 0.0269,
807
+ 0.04,
808
+ 0.0614,
809
+ 0.0752,
810
+ 0.2084,
811
+ 0.2276,
812
+ 0.2407,
813
+ 0.2268,
814
+ 0.0806,
815
+ 0.1013,
816
+ 0.1142,
817
+ 0.1172,
818
+ 0.1184,
819
+ 0.3218,
820
+ 0.3048,
821
+ 0.2774,
822
+ 0.3436,
823
+ 0.0862,
824
+ 0.0855,
825
+ 0.0331
826
+ ],
827
+ "source": "synthetic_iso_new_england_style",
828
+ "region": "ISONE",
829
+ "currency": "USD"
830
+ },
831
+ {
832
+ "day": 26,
833
+ "day_type": "weekday",
834
+ "prices_usd_per_kwh": [
835
+ 0.0334,
836
+ 0.0307,
837
+ 0.0429,
838
+ 0.0409,
839
+ 0.0228,
840
+ 0.0382,
841
+ 0.0688,
842
+ 0.0596,
843
+ 0.1827,
844
+ 0.2235,
845
+ 0.2109,
846
+ 0.2286,
847
+ 0.0906,
848
+ 0.1201,
849
+ 0.1264,
850
+ 0.1179,
851
+ 0.1338,
852
+ 0.3036,
853
+ 0.289,
854
+ 0.2755,
855
+ 0.3041,
856
+ 0.0638,
857
+ 0.0826,
858
+ 0.0323
859
+ ],
860
+ "source": "synthetic_iso_new_england_style",
861
+ "region": "ISONE",
862
+ "currency": "USD"
863
+ },
864
+ {
865
+ "day": 27,
866
+ "day_type": "weekend",
867
+ "prices_usd_per_kwh": [
868
+ 0.0391,
869
+ 0.0402,
870
+ 0.0327,
871
+ 0.0326,
872
+ 0.0424,
873
+ 0.0362,
874
+ 0.0815,
875
+ 0.0695,
876
+ 0.1723,
877
+ 0.1985,
878
+ 0.1525,
879
+ 0.1767,
880
+ 0.1083,
881
+ 0.0803,
882
+ 0.1483,
883
+ 0.1069,
884
+ 0.1308,
885
+ 0.2403,
886
+ 0.2243,
887
+ 0.231,
888
+ 0.2642,
889
+ 0.0769,
890
+ 0.0909,
891
+ 0.0256
892
+ ],
893
+ "source": "synthetic_iso_new_england_style",
894
+ "region": "ISONE",
895
+ "currency": "USD"
896
+ },
897
+ {
898
+ "day": 28,
899
+ "day_type": "weekend",
900
+ "prices_usd_per_kwh": [
901
+ 0.0316,
902
+ 0.0226,
903
+ 0.0324,
904
+ 0.0356,
905
+ 0.0229,
906
+ 0.0337,
907
+ 0.0647,
908
+ 0.0654,
909
+ 0.1687,
910
+ 0.1662,
911
+ 0.1826,
912
+ 0.1819,
913
+ 0.0939,
914
+ 0.1113,
915
+ 0.1333,
916
+ 0.1179,
917
+ 0.1121,
918
+ 0.221,
919
+ 0.2202,
920
+ 0.2574,
921
+ 0.2283,
922
+ 0.0917,
923
+ 0.082,
924
+ 0.0341
925
+ ],
926
+ "source": "synthetic_iso_new_england_style",
927
+ "region": "ISONE",
928
+ "currency": "USD"
929
+ },
930
+ {
931
+ "day": 29,
932
+ "day_type": "weekday",
933
+ "prices_usd_per_kwh": [
934
+ 0.0453,
935
+ 0.0437,
936
+ 0.039,
937
+ 0.0347,
938
+ 0.0303,
939
+ 0.0423,
940
+ 0.0597,
941
+ 0.0696,
942
+ 0.213,
943
+ 0.1768,
944
+ 0.2078,
945
+ 0.233,
946
+ 0.1092,
947
+ 0.0769,
948
+ 0.1379,
949
+ 0.0999,
950
+ 0.1181,
951
+ 0.2952,
952
+ 0.2739,
953
+ 0.3293,
954
+ 0.3004,
955
+ 0.0936,
956
+ 0.0758,
957
+ 0.0343
958
+ ],
959
+ "source": "synthetic_iso_new_england_style",
960
+ "region": "ISONE",
961
+ "currency": "USD"
962
+ },
963
+ {
964
+ "day": 30,
965
+ "day_type": "weekday",
966
+ "prices_usd_per_kwh": [
967
+ 0.0431,
968
+ 0.0348,
969
+ 0.0338,
970
+ 0.0354,
971
+ 0.0305,
972
+ 0.0254,
973
+ 0.0598,
974
+ 0.0688,
975
+ 0.2247,
976
+ 0.2196,
977
+ 0.1798,
978
+ 0.1903,
979
+ 0.0898,
980
+ 0.1202,
981
+ 0.1038,
982
+ 0.11,
983
+ 0.127,
984
+ 0.3367,
985
+ 0.317,
986
+ 0.3198,
987
+ 0.338,
988
+ 0.0768,
989
+ 0.0962,
990
+ 0.0387
991
+ ],
992
+ "source": "synthetic_iso_new_england_style",
993
+ "region": "ISONE",
994
+ "currency": "USD"
995
+ }
996
+ ]
997
+ }
env/environment.go ADDED
@@ -0,0 +1,679 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Package env implements the GridMind-RL simulation core.
2
+ // It models a multi-building industrial/commercial energy management system
3
+ // with stochastic electricity prices, thermal dynamics, and batch job scheduling.
4
+ package env
5
+
6
+ import (
7
+ "math"
8
+ "math/rand"
9
+ "sync"
10
+ "time"
11
+ )
12
+
13
+ const (
14
+ EpisodeSteps = 96 // 24 hours × 15-min intervals
15
+ StepDurationHrs = 0.25 // each step = 15 minutes = 0.25 h
16
+ MaxBuildings = 3
17
+ DefaultSetpoint = 21.0 // °C comfortable indoor temp
18
+ TMinDefault = 19.0 // °C lower bound
19
+ TMaxDefault = 23.0 // °C upper bound
20
+ MaxHVACPowerKW = 50.0 // kW per building
21
+ MaxStorageKWh = 100.0 // kWh thermal storage capacity
22
+ StorageLossRate = 0.005 // fraction lost per step (thermal dissipation)
23
+ MaxBatchJobs = 5 // max concurrent batch jobs per building
24
+ )
25
+
26
+ // Environment is the thread-safe top-level simulation manager.
27
+ type Environment struct {
28
+ mu sync.RWMutex
29
+ rng *rand.Rand
30
+ seed int64
31
+ episode int
32
+ step int
33
+ done bool
34
+ taskID int
35
+ difficulty string
36
+ numBuildings int
37
+
38
+ Buildings []*BuildingState
39
+ PriceCurve [EpisodeSteps]float64 // $/kWh for each step
40
+ CarbonCurve [EpisodeSteps]float64 // gCO2/kWh for each step
41
+ Replay []ReplayEntry
42
+ LastActions []ActionModel
43
+
44
+ // History for dashboard rendering (per building)
45
+ TempHistory [][]float64
46
+ CostHistory [][]float64
47
+ HVACHistory [][]float64
48
+ LoadShedHistory [][]float64
49
+ RewardHistory [][]RewardComponents
50
+
51
+ // Exploit detection counters
52
+ totalShedSteps []int // steps where load_shed > 0.4
53
+ thermalCycleCounts []int // rapid thermal storage reversals
54
+ prevChargeRates []float64
55
+ }
56
+
57
+ // NewEnvironment creates an initialised (but not reset) environment.
58
+ func NewEnvironment() *Environment {
59
+ seed := time.Now().UnixNano()
60
+ return &Environment{
61
+ rng: rand.New(rand.NewSource(seed)),
62
+ seed: seed,
63
+ taskID: 1,
64
+ difficulty: "easy",
65
+ numBuildings: 1,
66
+ }
67
+ }
68
+
69
+ // Reset initializes a new episode. Thread-safe.
70
+ func (e *Environment) Reset(req ResetRequest) ResetResponse {
71
+ e.mu.Lock()
72
+ defer e.mu.Unlock()
73
+
74
+ // Apply seed
75
+ if req.Seed != nil {
76
+ e.seed = *req.Seed
77
+ } else {
78
+ e.seed = time.Now().UnixNano()
79
+ }
80
+ e.rng = rand.New(rand.NewSource(e.seed))
81
+
82
+ // Apply task and difficulty
83
+ e.taskID = req.TaskID
84
+ if e.taskID < 1 || e.taskID > 3 {
85
+ e.taskID = 1
86
+ }
87
+ e.difficulty = req.Difficulty
88
+ if e.difficulty == "" {
89
+ switch e.taskID {
90
+ case 1:
91
+ e.difficulty = "easy"
92
+ case 2:
93
+ e.difficulty = "medium"
94
+ case 3:
95
+ e.difficulty = "hard"
96
+ }
97
+ }
98
+
99
+ // Number of buildings (federation)
100
+ e.numBuildings = req.NumBuildings
101
+ if e.numBuildings < 1 {
102
+ e.numBuildings = 1
103
+ }
104
+ if e.numBuildings > MaxBuildings {
105
+ e.numBuildings = MaxBuildings
106
+ }
107
+
108
+ e.episode++
109
+ e.step = 0
110
+ e.done = false
111
+ e.Replay = make([]ReplayEntry, 0, EpisodeSteps)
112
+ e.LastActions = make([]ActionModel, e.numBuildings)
113
+
114
+ // Generate price and carbon curves for this episode
115
+ e.generatePriceCurve()
116
+ e.generateCarbonCurve()
117
+
118
+ // Initialise buildings
119
+ e.Buildings = make([]*BuildingState, e.numBuildings)
120
+ e.TempHistory = make([][]float64, e.numBuildings)
121
+ e.CostHistory = make([][]float64, e.numBuildings)
122
+ e.HVACHistory = make([][]float64, e.numBuildings)
123
+ e.LoadShedHistory = make([][]float64, e.numBuildings)
124
+ e.RewardHistory = make([][]RewardComponents, e.numBuildings)
125
+ e.totalShedSteps = make([]int, e.numBuildings)
126
+ e.thermalCycleCounts = make([]int, e.numBuildings)
127
+ e.prevChargeRates = make([]float64, e.numBuildings)
128
+
129
+ for i := 0; i < e.numBuildings; i++ {
130
+ e.Buildings[i] = e.newBuildingState(i)
131
+ e.TempHistory[i] = make([]float64, 0, EpisodeSteps)
132
+ e.CostHistory[i] = make([]float64, 0, EpisodeSteps)
133
+ e.HVACHistory[i] = make([]float64, 0, EpisodeSteps)
134
+ e.LoadShedHistory[i] = make([]float64, 0, EpisodeSteps)
135
+ e.RewardHistory[i] = make([]RewardComponents, 0, EpisodeSteps)
136
+ }
137
+
138
+ obs := make([]ObservationModel, e.numBuildings)
139
+ for i, b := range e.Buildings {
140
+ obs[i] = e.buildObservation(b)
141
+ }
142
+
143
+ return ResetResponse{
144
+ Observations: obs,
145
+ Episode: e.episode,
146
+ TaskID: e.taskID,
147
+ Seed: e.seed,
148
+ }
149
+ }
150
+
151
+ // Step advances the simulation by one timestep for all buildings. Thread-safe.
152
+ func (e *Environment) Step(actions []ActionModel) ([]StepResponse, bool) {
153
+ e.mu.Lock()
154
+ defer e.mu.Unlock()
155
+
156
+ if e.done {
157
+ return nil, true
158
+ }
159
+
160
+ // Validate and clamp actions
161
+ for i := range actions {
162
+ e.clampAction(&actions[i])
163
+ if i < e.numBuildings {
164
+ e.LastActions[i] = actions[i]
165
+ }
166
+ }
167
+
168
+ responses := make([]StepResponse, e.numBuildings)
169
+ for i, b := range e.Buildings {
170
+ var act ActionModel
171
+ // Find action for this building (by building_id or by index)
172
+ act = e.findAction(actions, i)
173
+ responses[i] = e.stepBuilding(b, act, i)
174
+ }
175
+
176
+ e.step++
177
+ if e.step >= EpisodeSteps {
178
+ e.done = true
179
+ }
180
+
181
+ // Record replay entry (aggregate of all buildings, first building primary)
182
+ if len(responses) > 0 {
183
+ entry := ReplayEntry{
184
+ Step: e.step - 1,
185
+ Observation: responses[0].Observation,
186
+ Action: e.LastActions[0],
187
+ Reward: responses[0].Reward,
188
+ Components: responses[0].Info.RewardComponents,
189
+ Done: e.done,
190
+ }
191
+ e.Replay = append(e.Replay, entry)
192
+ }
193
+
194
+ return responses, e.done
195
+ }
196
+
197
+ // GetState returns a full snapshot of environment state. Thread-safe (read lock).
198
+ func (e *Environment) GetState() StateResponse {
199
+ e.mu.RLock()
200
+ defer e.mu.RUnlock()
201
+
202
+ buildings := make([]BuildingStatePublic, e.numBuildings)
203
+ for i, b := range e.Buildings {
204
+ pub := BuildingStatePublic{
205
+ ObservationModel: e.buildObservation(b),
206
+ OutdoorTemperature: b.OutdoorTemperature,
207
+ SetpointTemperature: b.SetpointTemperature,
208
+ BaselineCost: b.BaselineCost,
209
+ CumulativeCarbon: b.CumulativeCarbon,
210
+ Jobs: b.Jobs,
211
+ }
212
+ if i < len(e.TempHistory) {
213
+ pub.TempHistory = e.TempHistory[i]
214
+ pub.CostHistory = e.CostHistory[i]
215
+ pub.HVACHistory = e.HVACHistory[i]
216
+ pub.LoadShedHistory = e.LoadShedHistory[i]
217
+ pub.RewardHistory = e.RewardHistory[i]
218
+ }
219
+ buildings[i] = pub
220
+ }
221
+
222
+ priceCurve := make([]float64, 24)
223
+ carbonCurve := make([]float64, 24)
224
+ for h := 0; h < 24; h++ {
225
+ stepIdx := h * 4
226
+ if stepIdx < EpisodeSteps {
227
+ priceCurve[h] = e.PriceCurve[stepIdx]
228
+ carbonCurve[h] = e.CarbonCurve[stepIdx]
229
+ }
230
+ }
231
+
232
+ return StateResponse{
233
+ Buildings: buildings,
234
+ PriceCurve: priceCurve,
235
+ CarbonCurve: carbonCurve,
236
+ Episode: e.episode,
237
+ Step: e.step,
238
+ TaskID: e.taskID,
239
+ Done: e.done,
240
+ Seed: e.seed,
241
+ }
242
+ }
243
+
244
+ // GetReplay returns the full episode replay. Thread-safe.
245
+ func (e *Environment) GetReplay() []ReplayEntry {
246
+ e.mu.RLock()
247
+ defer e.mu.RUnlock()
248
+ result := make([]ReplayEntry, len(e.Replay))
249
+ copy(result, e.Replay)
250
+ return result
251
+ }
252
+
253
+ // ──────────────────────────────────────────────
254
+ // Internal helpers
255
+ // ──────────────────────────────────────────────
256
+
257
+ func (e *Environment) newBuildingState(id int) *BuildingState {
258
+ // Randomise initial conditions slightly
259
+ initTemp := DefaultSetpoint + (e.rng.Float64()-0.5)*2.0
260
+ storageLevel := 0.3 + e.rng.Float64()*0.4 // start 30–70% full
261
+ outdoorTemp := 15.0 + e.rng.Float64()*15.0 // 15–30 °C
262
+
263
+ b := &BuildingState{
264
+ BuildingID: id,
265
+ IndoorTemperature: initTemp,
266
+ ThermalStorageLevel: storageLevel,
267
+ ProcessDemand: 10.0 + e.rng.Float64()*20.0,
268
+ CurrentPrice: e.PriceCurve[0],
269
+ GridStressSignal: 0.0,
270
+ CarbonIntensity: e.CarbonCurve[0],
271
+ HourOfDay: 0,
272
+ Step: 0,
273
+ BatchQueue: []int{},
274
+ CumulativeCost: 0.0,
275
+ CumulativeCarbon: 0.0,
276
+ OutdoorTemperature: outdoorTemp,
277
+ PrevHVACLevel: 0.5,
278
+ BaselineCost: 0.0,
279
+ SetpointTemperature: DefaultSetpoint,
280
+ MaxHVACPower: MaxHVACPowerKW,
281
+ MaxStorageCapacity: MaxStorageKWh,
282
+ ThermalLossRate: StorageLossRate,
283
+ }
284
+
285
+ // Spawn batch jobs based on difficulty
286
+ b.Jobs = e.generateBatchJobs()
287
+ b.BatchQueue = pendingDeadlines(b.Jobs)
288
+ return b
289
+ }
290
+
291
+ func (e *Environment) generateBatchJobs() []BatchJob {
292
+ numJobs := 3
293
+ switch e.difficulty {
294
+ case "medium":
295
+ numJobs = 4
296
+ case "hard":
297
+ numJobs = 5
298
+ }
299
+
300
+ jobs := make([]BatchJob, numJobs)
301
+ for i := range jobs {
302
+ // Deadline spread across episode, ensuring feasibility
303
+ deadline := 20 + e.rng.Intn(60)
304
+ jobs[i] = BatchJob{
305
+ ID: i + 1,
306
+ DeadlineSlot: deadline,
307
+ Duration: 1 + e.rng.Intn(3),
308
+ PowerDraw: 5.0 + e.rng.Float64()*15.0,
309
+ Scheduled: false,
310
+ ScheduledAt: -1,
311
+ Completed: false,
312
+ MissedDeadline: false,
313
+ }
314
+ }
315
+ return jobs
316
+ }
317
+
318
+ // generatePriceCurve creates a stochastic Time-of-Use price curve for the episode.
319
+ func (e *Environment) generatePriceCurve() {
320
+ // Base ToU: low overnight, moderate morning, high peak (8-12, 17-21), low night
321
+ volatility := 0.1
322
+ switch e.difficulty {
323
+ case "medium":
324
+ volatility = 0.2
325
+ case "hard":
326
+ volatility = 0.35
327
+ }
328
+
329
+ // Random peak window shift (±2 hours) for stochasticity
330
+ morningPeakShift := e.rng.Intn(5) - 2
331
+ eveningPeakShift := e.rng.Intn(5) - 2
332
+
333
+ for s := 0; s < EpisodeSteps; s++ {
334
+ hour := (s / 4)
335
+ base := touPrice(hour, morningPeakShift, eveningPeakShift)
336
+ noise := (e.rng.Float64()*2 - 1) * volatility * base
337
+ price := math.Max(0.02, base+noise)
338
+ e.PriceCurve[s] = price
339
+ }
340
+ }
341
+
342
+ // touPrice returns the base time-of-use price for a given hour.
343
+ func touPrice(hour, morningShift, eveningShift int) float64 {
344
+ // Off-peak: 0.04 $/kWh, on-peak: 0.18 $/kWh, extreme peak: 0.32 $/kWh
345
+ morningPeakStart := 8 + morningShift
346
+ morningPeakEnd := 12 + morningShift
347
+ eveningPeakStart := 17 + eveningShift
348
+ eveningPeakEnd := 21 + eveningShift
349
+
350
+ switch {
351
+ case hour >= morningPeakStart && hour < morningPeakEnd:
352
+ return 0.18
353
+ case hour >= eveningPeakStart && hour <= eveningPeakEnd:
354
+ return 0.22
355
+ case (hour >= 9 && hour < morningPeakStart) || (hour >= morningPeakEnd && hour < eveningPeakStart):
356
+ return 0.10
357
+ case hour >= 23 || hour < 6:
358
+ return 0.04
359
+ default:
360
+ return 0.08
361
+ }
362
+ }
363
+
364
+ // generateCarbonCurve creates a realistic carbon intensity curve (gCO2/kWh).
365
+ // Correlates roughly with price: higher price = more peaker plants = higher carbon.
366
+ func (e *Environment) generateCarbonCurve() {
367
+ for s := 0; s < EpisodeSteps; s++ {
368
+ price := e.PriceCurve[s]
369
+ // Map price range [0.04, 0.32] → carbon [150, 600] gCO2/kWh
370
+ carbon := 150.0 + (price-0.04)/(0.32-0.04)*(600.0-150.0)
371
+ noise := (e.rng.Float64()*2 - 1) * 30.0
372
+ e.CarbonCurve[s] = math.Max(100.0, carbon+noise)
373
+ }
374
+ }
375
+
376
+ // stepBuilding advances a single building by one timestep.
377
+ func (e *Environment) stepBuilding(b *BuildingState, act ActionModel, idx int) StepResponse {
378
+ s := e.step
379
+
380
+ // Update environmental signals from curves
381
+ b.CurrentPrice = e.PriceCurve[s]
382
+ b.CarbonIntensity = e.CarbonCurve[s]
383
+ b.HourOfDay = (s / 4) % 24
384
+
385
+ // Stochastic grid stress events (more frequent in hard mode)
386
+ b.GridStressSignal = e.updateGridStress(b, s)
387
+
388
+ // Weather perturbation: outdoor temp drifts sinusoidally + noise
389
+ b.OutdoorTemperature = e.updateOutdoorTemp(b, s)
390
+
391
+ // Process demand fluctuation
392
+ b.ProcessDemand = e.updateProcessDemand(b, s)
393
+
394
+ // ----- Apply actions -----
395
+
396
+ // 1. HVAC: heats/cools building toward setpoint
397
+ hvacPower := act.HVACPowerLevel * b.MaxHVACPower // kW
398
+
399
+ // 2. Thermal storage: charge or discharge
400
+ chargeKW := act.ThermalChargeRate * b.MaxHVACPower * 0.3 // max 30% of HVAC for storage
401
+ newStorageEnergy := b.ThermalStorageLevel*b.MaxStorageCapacity + chargeKW*StepDurationHrs
402
+ // Apply thermal losses
403
+ newStorageEnergy *= (1.0 - b.ThermalLossRate)
404
+ newStorageEnergy = math.Max(0, math.Min(b.MaxStorageCapacity, newStorageEnergy))
405
+ b.ThermalStorageLevel = newStorageEnergy / b.MaxStorageCapacity
406
+
407
+ // 3. Load shedding
408
+ clampedShed := math.Max(0, math.Min(0.5, act.LoadShedFraction))
409
+ shedKW := clampedShed * b.ProcessDemand
410
+
411
+ // 4. Batch job scheduling
412
+ batchCompleted, batchMissed := e.updateBatchJobs(b, act.BatchJobSlot, s)
413
+
414
+ // ----- Thermal dynamics -----
415
+ // Simple first-order thermal model:
416
+ // ΔT per step = (HVAC effect + outdoor infiltration + storage discharge effect - process demand)
417
+ hvacEffect := (act.HVACPowerLevel - 0.5) * 2.0 * 1.5 // ±3°C max swing per step
418
+ infiltration := (b.OutdoorTemperature - b.IndoorTemperature) * 0.03
419
+ storageEffect := 0.0
420
+ if act.ThermalChargeRate < 0 { // discharging storage = provides cooling/heating
421
+ storageEffect = math.Abs(act.ThermalChargeRate) * 0.5
422
+ }
423
+ processHeat := b.ProcessDemand * 0.002 // kW→°C rough factor
424
+ deltaT := hvacEffect + infiltration + storageEffect - processHeat
425
+ b.IndoorTemperature += deltaT
426
+
427
+ // ----- Energy & cost accounting -----
428
+ batchPowerDraw := e.batchRunningPower(b)
429
+ totalKW := hvacPower + math.Max(0, chargeKW) + batchPowerDraw - shedKW
430
+ totalKW = math.Max(0, totalKW)
431
+ energyKWh := totalKW * StepDurationHrs
432
+ stepCost := energyKWh * b.CurrentPrice
433
+ stepCarbon := energyKWh * b.CarbonIntensity
434
+
435
+ b.CumulativeCost += stepCost
436
+ b.CumulativeCarbon += stepCarbon
437
+
438
+ // Baseline (always-on at 70% HVAC, no storage/shedding)
439
+ baselineKW := 0.7*b.MaxHVACPower + b.ProcessDemand
440
+ baselineEnergy := baselineKW * StepDurationHrs
441
+ b.BaselineCost += baselineEnergy * b.CurrentPrice
442
+
443
+ // ----- Reward computation -----
444
+ rc := ComputeReward(ComputeRewardInput{
445
+ B: b,
446
+ Act: act,
447
+ StepCost: stepCost,
448
+ EnergyKWh: energyKWh,
449
+ TMin: TMinDefault,
450
+ TMax: TMaxDefault,
451
+ StepCarbon: stepCarbon,
452
+ BatchMissed: len(batchMissed),
453
+ GridStress: b.GridStressSignal,
454
+ ShedFraction: clampedShed,
455
+ TaskID: e.taskID,
456
+ PrevHVACLevel: b.PrevHVACLevel,
457
+ ChargeRate: act.ThermalChargeRate,
458
+ PrevChargeRate: e.prevChargeRates[idx],
459
+ StorageDelta: act.ThermalChargeRate,
460
+ PriceCurve: e.PriceCurve[:],
461
+ CurrentStep: s,
462
+ })
463
+ b.PrevHVACLevel = act.HVACPowerLevel
464
+ e.prevChargeRates[idx] = act.ThermalChargeRate
465
+
466
+ // Update batch queue
467
+ b.BatchQueue = pendingDeadlines(b.Jobs)
468
+
469
+ // Exploit detection
470
+ if clampedShed > 0.4 {
471
+ e.totalShedSteps[idx]++
472
+ }
473
+ if len(e.thermalCycleCounts) > idx {
474
+ if len(e.Replay) > 0 {
475
+ prev := e.prevChargeRates[idx]
476
+ if prev > 0.3 && act.ThermalChargeRate < -0.3 || prev < -0.3 && act.ThermalChargeRate > 0.3 {
477
+ e.thermalCycleCounts[idx]++
478
+ }
479
+ }
480
+ }
481
+
482
+ // Record history
483
+ if idx < len(e.TempHistory) {
484
+ e.TempHistory[idx] = append(e.TempHistory[idx], b.IndoorTemperature)
485
+ e.CostHistory[idx] = append(e.CostHistory[idx], b.CumulativeCost)
486
+ e.HVACHistory[idx] = append(e.HVACHistory[idx], act.HVACPowerLevel)
487
+ e.LoadShedHistory[idx] = append(e.LoadShedHistory[idx], clampedShed)
488
+ e.RewardHistory[idx] = append(e.RewardHistory[idx], rc)
489
+ }
490
+
491
+ obs := e.buildObservation(b)
492
+
493
+ return StepResponse{
494
+ Observation: obs,
495
+ Reward: rc.Total,
496
+ Done: e.done || s+1 >= EpisodeSteps,
497
+ Info: StepInfo{
498
+ RewardComponents: rc,
499
+ EnergyUsed: energyKWh,
500
+ CarbonEmitted: stepCarbon,
501
+ PriceSignal: b.CurrentPrice,
502
+ GridStress: b.GridStressSignal,
503
+ BatchCompleted: batchCompleted,
504
+ BatchMissed: batchMissed,
505
+ Episode: e.episode,
506
+ Step: s,
507
+ },
508
+ }
509
+ }
510
+
511
+ func (e *Environment) updateGridStress(b *BuildingState, s int) float64 {
512
+ // Grid stress is elevated during price peaks and stochastic demand spikes
513
+ price := e.PriceCurve[s]
514
+ priceNorm := (price - 0.04) / (0.32 - 0.04)
515
+
516
+ // Random stress events
517
+ stressProb := 0.05
518
+ switch e.difficulty {
519
+ case "medium":
520
+ stressProb = 0.1
521
+ case "hard":
522
+ stressProb = 0.2
523
+ }
524
+ spike := 0.0
525
+ if e.rng.Float64() < stressProb {
526
+ spike = 0.3 + e.rng.Float64()*0.5
527
+ }
528
+ stress := math.Min(1.0, priceNorm*0.6+spike)
529
+ return math.Max(0, stress)
530
+ }
531
+
532
+ func (e *Environment) updateOutdoorTemp(b *BuildingState, s int) float64 {
533
+ // Sinusoidal daily temperature cycle + noise
534
+ hour := float64(s) / 4.0
535
+ baseTemp := 15.0 + 8.0*math.Sin(2*math.Pi*(hour-6)/24.0)
536
+ noise := (e.rng.Float64()*2 - 1) * 1.5
537
+ return baseTemp + noise
538
+ }
539
+
540
+ func (e *Environment) updateProcessDemand(b *BuildingState, s int) float64 {
541
+ // Process demand shifts with business hours
542
+ hour := s / 4
543
+ base := 10.0
544
+ if hour >= 8 && hour <= 18 {
545
+ base = 20.0 + 10.0*math.Sin(math.Pi*float64(hour-8)/10.0)
546
+ }
547
+ noise := (e.rng.Float64()*2 - 1) * 3.0
548
+ return math.Max(0, base+noise)
549
+ }
550
+
551
+ func (e *Environment) updateBatchJobs(b *BuildingState, slot int, step int) (completed []int, missed []int) {
552
+ completed = []int{}
553
+ missed = []int{}
554
+
555
+ // Schedule the first pending job into the chosen slot
556
+ for i := range b.Jobs {
557
+ job := &b.Jobs[i]
558
+ if !job.Scheduled && !job.Completed && !job.MissedDeadline {
559
+ schedAt := step + slot
560
+ job.Scheduled = true
561
+ job.ScheduledAt = schedAt
562
+ break // only schedule one job per step
563
+ }
564
+ }
565
+
566
+ // Advance running or completed jobs
567
+ for i := range b.Jobs {
568
+ job := &b.Jobs[i]
569
+ if job.Completed || job.MissedDeadline {
570
+ continue
571
+ }
572
+ // Check deadline miss
573
+ if step >= job.DeadlineSlot && !job.Completed {
574
+ job.MissedDeadline = true
575
+ missed = append(missed, job.ID)
576
+ continue
577
+ }
578
+ // Mark as completed if scheduled and past its start
579
+ if job.Scheduled && step >= job.ScheduledAt {
580
+ if step >= job.ScheduledAt+job.Duration-1 {
581
+ job.Completed = true
582
+ completed = append(completed, job.ID)
583
+ }
584
+ }
585
+ }
586
+ return
587
+ }
588
+
589
+ func (e *Environment) batchRunningPower(b *BuildingState) float64 {
590
+ total := 0.0
591
+ for _, job := range b.Jobs {
592
+ if job.Scheduled && !job.Completed && !job.MissedDeadline {
593
+ if e.step >= job.ScheduledAt && e.step < job.ScheduledAt+job.Duration {
594
+ total += job.PowerDraw
595
+ }
596
+ }
597
+ }
598
+ return total
599
+ }
600
+
601
+ func (e *Environment) buildObservation(b *BuildingState) ObservationModel {
602
+ return ObservationModel{
603
+ IndoorTemperature: math.Round(b.IndoorTemperature*100) / 100,
604
+ ThermalStorageLevel: math.Round(b.ThermalStorageLevel*1000) / 1000,
605
+ ProcessDemand: math.Round(b.ProcessDemand*100) / 100,
606
+ CurrentPrice: math.Round(b.CurrentPrice*10000) / 10000,
607
+ GridStressSignal: math.Round(b.GridStressSignal*1000) / 1000,
608
+ CarbonIntensity: math.Round(b.CarbonIntensity*10) / 10,
609
+ HourOfDay: b.HourOfDay,
610
+ BatchQueue: pendingDeadlines(b.Jobs),
611
+ CumulativeCost: math.Round(b.CumulativeCost*10000) / 10000,
612
+ Step: b.Step,
613
+ BuildingID: b.BuildingID,
614
+ }
615
+ }
616
+
617
+ func (e *Environment) clampAction(a *ActionModel) {
618
+ a.HVACPowerLevel = math.Max(0, math.Min(1.0, a.HVACPowerLevel))
619
+ a.ThermalChargeRate = math.Max(-1.0, math.Min(1.0, a.ThermalChargeRate))
620
+ a.BatchJobSlot = max(0, min(4, a.BatchJobSlot))
621
+ a.LoadShedFraction = math.Max(0, math.Min(0.5, a.LoadShedFraction))
622
+ }
623
+
624
+ func (e *Environment) findAction(actions []ActionModel, buildingIdx int) ActionModel {
625
+ // Try to find an action with matching building_id, else use positional
626
+ for _, a := range actions {
627
+ if a.BuildingID == buildingIdx {
628
+ return a
629
+ }
630
+ }
631
+ if buildingIdx < len(actions) {
632
+ return actions[buildingIdx]
633
+ }
634
+ // Default: do-nothing action
635
+ return ActionModel{HVACPowerLevel: 0.5, ThermalChargeRate: 0.0, BatchJobSlot: 0, LoadShedFraction: 0.0}
636
+ }
637
+
638
+ // pendingDeadlines returns a slice of deadline slots for all incomplete, unscheduled jobs.
639
+ func pendingDeadlines(jobs []BatchJob) []int {
640
+ result := []int{}
641
+ for _, j := range jobs {
642
+ if !j.Completed && !j.MissedDeadline {
643
+ result = append(result, j.DeadlineSlot)
644
+ }
645
+ }
646
+ return result
647
+ }
648
+
649
+ func max(a, b int) int {
650
+ if a > b {
651
+ return a
652
+ }
653
+ return b
654
+ }
655
+
656
+ func min(a, b int) int {
657
+ if a < b {
658
+ return a
659
+ }
660
+ return b
661
+ }
662
+
663
+ // ExploitDetected returns whether the current episode shows signs of degenerate strategies.
664
+ func (e *Environment) ExploitDetected(buildingIdx int) (bool, float64) {
665
+ e.mu.RLock()
666
+ defer e.mu.RUnlock()
667
+ if buildingIdx >= len(e.totalShedSteps) {
668
+ return false, 0.0
669
+ }
670
+ // Flag if agent always sheds > 40% load (more than 70% of steps)
671
+ shedRatio := float64(e.totalShedSteps[buildingIdx]) / float64(e.step+1)
672
+ cycleRatio := float64(e.thermalCycleCounts[buildingIdx]) / float64(e.step+1)
673
+ exploited := shedRatio > 0.7 || cycleRatio > 0.4
674
+ penalty := 0.0
675
+ if exploited {
676
+ penalty = math.Max(shedRatio-0.7, 0)*0.5 + math.Max(cycleRatio-0.4, 0)*0.3
677
+ }
678
+ return exploited, penalty
679
+ }
env/models.go ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Package env defines all data models for the GridMind-RL environment.
2
+ package env
3
+
4
+ // BatchJob represents a pending industrial/commercial batch process job.
5
+ type BatchJob struct {
6
+ ID int `json:"id"` // Unique job identifier
7
+ DeadlineSlot int `json:"deadline_slot"` // Latest timestep by which the job must run
8
+ Duration int `json:"duration"` // Steps the job takes to complete
9
+ PowerDraw float64 `json:"power_draw"` // kW drawn when job is running
10
+ Scheduled bool `json:"scheduled"` // Whether a time slot has been assigned
11
+ ScheduledAt int `json:"scheduled_at"` // Timestep at which job was scheduled (-1 if not yet)
12
+ Completed bool `json:"completed"` // Whether the job finished execution
13
+ MissedDeadline bool `json:"missed_deadline"` // True if deadline was exceeded
14
+ }
15
+
16
+ // BuildingState holds the full simulation state for a single building.
17
+ type BuildingState struct {
18
+ // Core physical state
19
+ IndoorTemperature float64 `json:"indoor_temperature"` // °C
20
+ ThermalStorageLevel float64 `json:"thermal_storage_level"` // 0.0–1.0 normalized
21
+ ProcessDemand float64 `json:"process_demand"` // kW current process need
22
+
23
+ // Market & grid signals
24
+ CurrentPrice float64 `json:"current_price"` // $/kWh
25
+ GridStressSignal float64 `json:"grid_stress_signal"` // 0.0–1.0 (DR urgency)
26
+ CarbonIntensity float64 `json:"carbon_intensity"` // gCO2/kWh
27
+
28
+ // Temporal
29
+ HourOfDay int `json:"hour_of_day"` // 0–23
30
+ Step int `json:"step"` // 0–95 within episode
31
+
32
+ // Batch job queue: pending deadlines (raw slots)
33
+ BatchQueue []int `json:"batch_queue"` // deadline slots of pending jobs
34
+
35
+ // Running cost tracker
36
+ CumulativeCost float64 `json:"cumulative_cost"` // $ total this episode
37
+ CumulativeCarbon float64 `json:"cumulative_carbon"` // gCO2 total this episode
38
+
39
+ // Internal tracking (not exposed in observation)
40
+ Jobs []BatchJob `json:"-"`
41
+ OutdoorTemperature float64 `json:"-"` // °C for weather perturbation
42
+ PrevHVACLevel float64 `json:"-"` // for stability penalty
43
+ BaselineCost float64 `json:"-"` // always-on policy running cost
44
+ SetpointTemperature float64 `json:"-"` // target indoor temp (°C)
45
+ MaxHVACPower float64 `json:"-"` // kW
46
+ MaxStorageCapacity float64 `json:"-"` // kWh
47
+ ThermalLossRate float64 `json:"-"` // fraction lost per step
48
+ BuildingID int `json:"-"` // which building in federation
49
+ }
50
+
51
+ // ObservationModel is the JSON-serializable observation returned on each step/state.
52
+ type ObservationModel struct {
53
+ IndoorTemperature float64 `json:"indoor_temperature"`
54
+ ThermalStorageLevel float64 `json:"thermal_storage_level"`
55
+ ProcessDemand float64 `json:"process_demand"`
56
+ CurrentPrice float64 `json:"current_price"`
57
+ GridStressSignal float64 `json:"grid_stress_signal"`
58
+ CarbonIntensity float64 `json:"carbon_intensity"`
59
+ HourOfDay int `json:"hour_of_day"`
60
+ BatchQueue []int `json:"batch_queue"`
61
+ CumulativeCost float64 `json:"cumulative_cost"`
62
+ Step int `json:"step"`
63
+ BuildingID int `json:"building_id"`
64
+ }
65
+
66
+ // ActionModel is the parsed agent action for a single step.
67
+ type ActionModel struct {
68
+ HVACPowerLevel float64 `json:"hvac_power_level"` // 0.0–1.0
69
+ ThermalChargeRate float64 `json:"thermal_charge_rate"` // -1.0 to 1.0
70
+ BatchJobSlot int `json:"batch_job_slot"` // 0–4 (0=now, 1–4=defer)
71
+ LoadShedFraction float64 `json:"load_shed_fraction"` // 0.0–0.5
72
+ BuildingID int `json:"building_id"` // which building to act on
73
+ }
74
+
75
+ // RewardComponents holds the individual components of the dense reward signal.
76
+ type RewardComponents struct {
77
+ CostSavings float64 `json:"cost_savings"` // negative = expensive
78
+ TempConstraint float64 `json:"temp_constraint"` // positive = within bounds
79
+ GridResponse float64 `json:"grid_response"` // bonus for DR compliance
80
+ DeadlinePenalty float64 `json:"deadline_penalty"` // negative for missed jobs
81
+ EfficiencyBonus float64 `json:"efficiency_bonus"` // storage arbitrage
82
+ StabilityPenalty float64 `json:"stability_penalty"` // HVAC oscillation penalty
83
+ CarbonReward float64 `json:"carbon_reward"` // low-carbon bonus
84
+ Total float64 `json:"total"`
85
+ }
86
+
87
+ // StepResponse is the full HTTP body returned from POST /step.
88
+ type StepResponse struct {
89
+ Observation ObservationModel `json:"observation"`
90
+ Reward float64 `json:"reward"`
91
+ Done bool `json:"done"`
92
+ Info StepInfo `json:"info"`
93
+ }
94
+
95
+ // StepInfo carries auxiliary information per step.
96
+ type StepInfo struct {
97
+ RewardComponents RewardComponents `json:"reward_components"`
98
+ EnergyUsed float64 `json:"energy_used_kwh"`
99
+ CarbonEmitted float64 `json:"carbon_emitted_gco2"`
100
+ PriceSignal float64 `json:"price_signal"`
101
+ GridStress float64 `json:"grid_stress"`
102
+ BatchCompleted []int `json:"batch_completed"` // IDs completed this step
103
+ BatchMissed []int `json:"batch_missed"` // IDs that missed deadline
104
+ Episode int `json:"episode"`
105
+ Step int `json:"step"`
106
+ }
107
+
108
+ // ResetRequest is the JSON body for POST /reset.
109
+ type ResetRequest struct {
110
+ Seed *int64 `json:"seed,omitempty"` // optional random seed
111
+ TaskID int `json:"task_id"` // 1, 2, or 3
112
+ Difficulty string `json:"difficulty,omitempty"` // "easy", "medium", "hard" or "" (auto)
113
+ NumBuildings int `json:"num_buildings,omitempty"` // 1–3 for federation
114
+ }
115
+
116
+ // ResetResponse is returned from POST /reset.
117
+ type ResetResponse struct {
118
+ Observations []ObservationModel `json:"observations"` // one per building
119
+ Episode int `json:"episode"`
120
+ TaskID int `json:"task_id"`
121
+ Seed int64 `json:"seed"`
122
+ }
123
+
124
+ // StateResponse is returned from GET /state.
125
+ type StateResponse struct {
126
+ Buildings []BuildingStatePublic `json:"buildings"`
127
+ PriceCurve []float64 `json:"price_curve_24h"` // full 24h ToU prices
128
+ CarbonCurve []float64 `json:"carbon_curve_24h"` // full 24h carbon intensities
129
+ Episode int `json:"episode"`
130
+ Step int `json:"step"`
131
+ TaskID int `json:"task_id"`
132
+ Done bool `json:"done"`
133
+ Seed int64 `json:"seed"`
134
+ }
135
+
136
+ // BuildingStatePublic is the dashboard-friendly full state per building.
137
+ type BuildingStatePublic struct {
138
+ ObservationModel
139
+ OutdoorTemperature float64 `json:"outdoor_temperature"`
140
+ SetpointTemperature float64 `json:"setpoint_temperature"`
141
+ BaselineCost float64 `json:"baseline_cost"`
142
+ CumulativeCarbon float64 `json:"cumulative_carbon"`
143
+ Jobs []BatchJob `json:"jobs"`
144
+ // History for chart rendering
145
+ TempHistory []float64 `json:"temp_history"`
146
+ CostHistory []float64 `json:"cost_history"`
147
+ HVACHistory []float64 `json:"hvac_history"`
148
+ LoadShedHistory []float64 `json:"load_shed_history"`
149
+ RewardHistory []RewardComponents `json:"reward_history"`
150
+ }
151
+
152
+ // ReplayEntry records a single timestep for episode replay export.
153
+ type ReplayEntry struct {
154
+ Step int `json:"step"`
155
+ Observation ObservationModel `json:"observation"`
156
+ Action ActionModel `json:"action"`
157
+ Reward float64 `json:"reward"`
158
+ Components RewardComponents `json:"components"`
159
+ Done bool `json:"done"`
160
+ }
161
+
162
+ // EpisodeGrade is the final grade returned for a completed episode.
163
+ type EpisodeGrade struct {
164
+ TaskID int `json:"task_id"`
165
+ Score float64 `json:"score"` // 0.0–1.0
166
+ SubScores map[string]float64 `json:"sub_scores"`
167
+ ExploitDetected bool `json:"exploit_detected"`
168
+ PenaltyApplied float64 `json:"penalty_applied"`
169
+ Details map[string]interface{} `json:"details"`
170
+ }
env/rewards.go ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Package env implements the multi-component dense reward function for GridMind-RL.
2
+ package env
3
+
4
+ import "math"
5
+
6
+ // ComputeRewardInput bundles all inputs needed to compute the reward for one step.
7
+ type ComputeRewardInput struct {
8
+ B *BuildingState
9
+ Act ActionModel
10
+ StepCost float64 // $ cost incurred this step
11
+ EnergyKWh float64 // kWh consumed this step
12
+ TMin float64 // lower temperature bound (°C)
13
+ TMax float64 // upper temperature bound (°C)
14
+ StepCarbon float64 // gCO2 emitted this step
15
+ BatchMissed int // number of batch jobs that missed deadline this step
16
+ GridStress float64 // 0.0–1.0 grid stress signal
17
+ ShedFraction float64 // clamped load shed fraction
18
+ TaskID int // 1, 2, or 3
19
+ PrevHVACLevel float64 // previous step's HVAC power level (for stability)
20
+ ChargeRate float64 // current thermal charge rate
21
+ PrevChargeRate float64 // previous step's thermal charge rate
22
+ StorageDelta float64 // change in storage level (+ = charging)
23
+ PriceCurve []float64 // full episode price curve for arbitrage calc
24
+ CurrentStep int // current step index
25
+ }
26
+
27
+ // ComputeReward returns a dense RewardComponents struct from the current step inputs.
28
+ // The reward is task-aware: task 1 only cares about cost, task 2 adds temperature,
29
+ // task 3 adds grid response, batch deadlines, and carbon.
30
+ func ComputeReward(inp ComputeRewardInput) RewardComponents {
31
+ rc := RewardComponents{}
32
+
33
+ // ── 1. Cost Savings ─────────────────────────────────────────────────────
34
+ // Negative reward proportional to energy cost. Normalised by typical step cost.
35
+ // Typical step cost at full load, peak price: 50kW * 0.25h * 0.32 = $4.00.
36
+ typicalCost := 4.0
37
+ rc.CostSavings = -(inp.StepCost / typicalCost) * 2.0
38
+
39
+ // ── 2. Temperature Constraint ────────────────────────────────────────────
40
+ // Only active for task 2 and 3.
41
+ if inp.TaskID >= 2 {
42
+ temp := inp.B.IndoorTemperature
43
+ rc.TempConstraint = computeTempReward(temp, inp.B.SetpointTemperature, inp.TMin, inp.TMax)
44
+ }
45
+
46
+ // ── 3. Grid Stress Response ──────────────────────────────────────────────
47
+ // Only active for task 3.
48
+ if inp.TaskID >= 3 {
49
+ rc.GridResponse = computeGridResponse(inp.GridStress, inp.ShedFraction)
50
+ }
51
+
52
+ // ── 4. Deadline Penalty ──────────────────────────────────────────────────
53
+ if inp.BatchMissed > 0 {
54
+ rc.DeadlinePenalty = -float64(inp.BatchMissed) * 1.5
55
+ }
56
+
57
+ // ── 5. Efficiency Bonus (thermal storage arbitrage) ───────────────────────
58
+ // Reward for charging storage during cheap periods and discharging during expensive ones.
59
+ if len(inp.PriceCurve) > inp.CurrentStep {
60
+ rc.EfficiencyBonus = computeArbitrageBonus(
61
+ inp.ChargeRate,
62
+ inp.PriceCurve[inp.CurrentStep],
63
+ inp.PriceCurve,
64
+ inp.CurrentStep,
65
+ )
66
+ }
67
+
68
+ // ── 6. Stability Penalty ─────────────────────────────────────────────────
69
+ // Penalise rapid oscillation in HVAC setpoint and thermal charge rate.
70
+ hvacDelta := math.Abs(inp.Act.HVACPowerLevel - inp.PrevHVACLevel)
71
+ chargeDelta := math.Abs(inp.ChargeRate - inp.PrevChargeRate)
72
+ oscillation := hvacDelta*0.5 + chargeDelta*0.3
73
+ if oscillation > 0.3 {
74
+ rc.StabilityPenalty = -(oscillation - 0.3) * 0.8
75
+ }
76
+
77
+ // ── 7. Carbon Reward ─────────────────────────────────────────────────────
78
+ // Low-carbon bonus: active for task 3 (and optional overlay on others).
79
+ if inp.TaskID >= 3 {
80
+ // Normalise carbon: iso-ne range roughly 100–700 gCO2/kWh
81
+ carbonNorm := (inp.B.CarbonIntensity - 100.0) / 600.0
82
+ // Reward for reducing energy during high-carbon periods
83
+ rc.CarbonReward = -inp.EnergyKWh * carbonNorm * 0.3
84
+ }
85
+
86
+ // ── Aggregate ────────────────────────────────────────────────────────────
87
+ rc.Total = rc.CostSavings + rc.TempConstraint + rc.GridResponse +
88
+ rc.DeadlinePenalty + rc.EfficiencyBonus + rc.StabilityPenalty + rc.CarbonReward
89
+
90
+ return rc
91
+ }
92
+
93
+ // computeTempReward returns a reward based on how close the indoor temperature
94
+ // is to the setpoint, with a hard penalty outside [TMin, TMax].
95
+ func computeTempReward(temp, setpoint, tMin, tMax float64) float64 {
96
+ if temp >= tMin && temp <= tMax {
97
+ // Gaussian-shaped bonus: maximum at setpoint, degrades toward bounds
98
+ deviation := math.Abs(temp - setpoint)
99
+ sigma := (tMax - tMin) / 4.0
100
+ return math.Exp(-0.5*(deviation/sigma)*(deviation/sigma)) * 0.5
101
+ }
102
+ // Outside bounds: proportional penalty
103
+ excess := math.Max(temp-tMax, tMin-temp)
104
+ return -excess * 0.4
105
+ }
106
+
107
+ // computeGridResponse returns a bonus for shedding load during high grid stress,
108
+ // and a mild penalty for shedding when the grid is fine.
109
+ func computeGridResponse(stress, shedFraction float64) float64 {
110
+ if stress > 0.7 {
111
+ // Bonus proportional to shed fraction
112
+ return shedFraction * stress * 1.5
113
+ }
114
+ // Mild penalty for unnecessary shedding (reduces productivity without benefit)
115
+ return -shedFraction * (0.7 - stress) * 0.3
116
+ }
117
+
118
+ // computeArbitrageBonus rewards charging storage during cheap periods and
119
+ // discharging during expensive periods.
120
+ func computeArbitrageBonus(chargeRate, currentPrice float64, curve []float64, step int) float64 {
121
+ // Compute rolling average of future prices (next 8 steps = 2 hours)
122
+ lookAhead := 8
123
+ futureSum := 0.0
124
+ count := 0
125
+ for i := step + 1; i <= step+lookAhead && i < len(curve); i++ {
126
+ futureSum += curve[i]
127
+ count++
128
+ }
129
+ if count == 0 {
130
+ return 0.0
131
+ }
132
+ futureAvg := futureSum / float64(count)
133
+
134
+ // If current price is lower than future avg → charging is smart → reward
135
+ if chargeRate > 0 && currentPrice < futureAvg {
136
+ return chargeRate * (futureAvg - currentPrice) * 2.0
137
+ }
138
+ // If current price is higher than future avg → discharging is smart → reward
139
+ if chargeRate < 0 && currentPrice > futureAvg {
140
+ return math.Abs(chargeRate) * (currentPrice - futureAvg) * 2.0
141
+ }
142
+ return 0.0
143
+ }
env/tasks.go ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Package env defines the three GridMind-RL tasks and their deterministic graders.
2
+ package env
3
+
4
+ import "math"
5
+
6
+ // TaskConfig describes a single task.
7
+ type TaskConfig struct {
8
+ ID int `json:"id"`
9
+ Name string `json:"name"`
10
+ Description string `json:"description"`
11
+ Difficulty string `json:"difficulty"`
12
+ Weights map[string]float64 `json:"weights"`
13
+ }
14
+
15
+ // AllTasks returns the ordered list of task configurations.
16
+ func AllTasks() []TaskConfig {
17
+ return []TaskConfig{
18
+ {
19
+ ID: 1,
20
+ Name: "Cost Minimization",
21
+ Description: "Minimize total energy cost over a 24-hour episode with no process constraints. Beat the always-on flat policy baseline.",
22
+ Difficulty: "easy",
23
+ Weights: map[string]float64{"cost": 1.0},
24
+ },
25
+ {
26
+ ID: 2,
27
+ Name: "Constrained Temperature Management",
28
+ Description: "Minimize cost while keeping indoor temperature within ±2°C of setpoint at all times.",
29
+ Difficulty: "medium",
30
+ Weights: map[string]float64{"cost": 0.6, "temperature": 0.4},
31
+ },
32
+ {
33
+ ID: 3,
34
+ Name: "Full Demand-Response with Batch Scheduling",
35
+ Description: "Minimize cost, maintain temperature, respond to grid stress events, schedule all batch jobs before their deadlines, and minimize carbon emissions.",
36
+ Difficulty: "hard",
37
+ Weights: map[string]float64{"cost": 0.35, "temperature": 0.25, "grid_response": 0.25, "batch_deadline": 0.15},
38
+ },
39
+ }
40
+ }
41
+
42
+ // GradeEpisodeInput collects all data needed to score a completed episode.
43
+ type GradeEpisodeInput struct {
44
+ TaskID int
45
+ Buildings []*BuildingState
46
+ Replay []ReplayEntry
47
+ TempHistory [][]float64 // per building, per step
48
+ TMin float64
49
+ TMax float64
50
+ ExploitPenalties []float64
51
+ }
52
+
53
+ // GradeEpisode computes a deterministic 0.0–1.0 score for a completed episode.
54
+ // Given a fixed random seed, this function is fully deterministic.
55
+ func GradeEpisode(inp GradeEpisodeInput) EpisodeGrade {
56
+ grade := EpisodeGrade{
57
+ TaskID: inp.TaskID,
58
+ SubScores: map[string]float64{},
59
+ Details: map[string]interface{}{},
60
+ }
61
+
62
+ switch inp.TaskID {
63
+ case 1:
64
+ grade = gradeTask1(inp, grade)
65
+ case 2:
66
+ grade = gradeTask2(inp, grade)
67
+ case 3:
68
+ grade = gradeTask3(inp, grade)
69
+ default:
70
+ grade = gradeTask1(inp, grade)
71
+ }
72
+
73
+ // Exploit detection: reduce score by penalty
74
+ totalPenalty := 0.0
75
+ for i, b := range inp.Buildings {
76
+ _ = b
77
+ if i < len(inp.ExploitPenalties) {
78
+ totalPenalty += inp.ExploitPenalties[i]
79
+ }
80
+ }
81
+ if totalPenalty > 0 {
82
+ grade.ExploitDetected = true
83
+ grade.PenaltyApplied = math.Min(totalPenalty, 0.3) // max 30% penalty
84
+ grade.Score = math.Max(0, grade.Score-grade.PenaltyApplied)
85
+ }
86
+
87
+ grade.Score = math.Round(grade.Score*10000) / 10000 // 4 decimal places
88
+ return grade
89
+ }
90
+
91
+ // ── Task 1: Cost Minimization ───────────────────────────────────────────────
92
+
93
+ func gradeTask1(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
94
+ agentCost := 0.0
95
+ baselineCost := 0.0
96
+ for _, b := range inp.Buildings {
97
+ agentCost += b.CumulativeCost
98
+ baselineCost += b.BaselineCost
99
+ }
100
+
101
+ var costScore float64
102
+ if baselineCost > 0 {
103
+ // score = max(0, 1 - agent_cost / baseline_cost)
104
+ // 0.0 if agent costs same or more, 1.0 if agent costs nothing
105
+ ratio := agentCost / baselineCost
106
+ costScore = math.Max(0, 1.0-ratio)
107
+ }
108
+
109
+ grade.SubScores["cost"] = math.Min(1.0, costScore)
110
+ grade.Score = grade.SubScores["cost"]
111
+ grade.Details["agent_cost"] = agentCost
112
+ grade.Details["baseline_cost"] = baselineCost
113
+ grade.Details["cost_ratio"] = agentCost / math.Max(baselineCost, 0.01)
114
+ return grade
115
+ }
116
+
117
+ // ── Task 2: Constrained Temperature Management ──────────────────────────────
118
+
119
+ func gradeTask2(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
120
+ // Cost sub-score (same as task 1)
121
+ grade = gradeTask1(inp, grade)
122
+ costScore := grade.SubScores["cost"]
123
+
124
+ // Temperature constraint sub-score
125
+ totalSteps := 0
126
+ withinBounds := 0
127
+ for i, history := range inp.TempHistory {
128
+ _ = i
129
+ for _, temp := range history {
130
+ totalSteps++
131
+ if temp >= inp.TMin && temp <= inp.TMax {
132
+ withinBounds++
133
+ }
134
+ }
135
+ }
136
+ constraintScore := 0.0
137
+ if totalSteps > 0 {
138
+ constraintScore = float64(withinBounds) / float64(totalSteps)
139
+ }
140
+
141
+ grade.SubScores["cost"] = costScore
142
+ grade.SubScores["temperature"] = constraintScore
143
+ grade.Score = costScore*0.6 + constraintScore*0.4
144
+ grade.Details["within_bounds_steps"] = withinBounds
145
+ grade.Details["total_steps"] = totalSteps
146
+ return grade
147
+ }
148
+
149
+ // ── Task 3: Full Demand-Response with Batch Scheduling ──────────────────────
150
+
151
+ func gradeTask3(inp GradeEpisodeInput, grade EpisodeGrade) EpisodeGrade {
152
+ // Reuse task 2 for cost + temperature scores
153
+ grade = gradeTask2(inp, grade)
154
+ costScore := grade.SubScores["cost"]
155
+ tempScore := grade.SubScores["temperature"]
156
+
157
+ // Grid response sub-score
158
+ // Count steps where stress > 0.7 and shed_fraction > 0.15
159
+ gridStressSteps := 0
160
+ gridResponseSteps := 0
161
+ for _, entry := range inp.Replay {
162
+ if entry.Observation.GridStressSignal > 0.7 {
163
+ gridStressSteps++
164
+ if entry.Action.LoadShedFraction > 0.15 {
165
+ gridResponseSteps++
166
+ }
167
+ }
168
+ }
169
+ gridScore := 0.5 // default neutral if no stress events
170
+ if gridStressSteps > 0 {
171
+ gridScore = float64(gridResponseSteps) / float64(gridStressSteps)
172
+ }
173
+
174
+ // Batch deadline sub-score
175
+ totalJobs := 0
176
+ completedOnTime := 0
177
+ for _, b := range inp.Buildings {
178
+ for _, job := range b.Jobs {
179
+ totalJobs++
180
+ if job.Completed && !job.MissedDeadline {
181
+ completedOnTime++
182
+ }
183
+ }
184
+ }
185
+ batchScore := 0.0
186
+ if totalJobs > 0 {
187
+ batchScore = float64(completedOnTime) / float64(totalJobs)
188
+ }
189
+
190
+ grade.SubScores["cost"] = costScore
191
+ grade.SubScores["temperature"] = tempScore
192
+ grade.SubScores["grid_response"] = gridScore
193
+ grade.SubScores["batch_deadline"] = batchScore
194
+
195
+ // Weighted composite score
196
+ grade.Score = costScore*0.35 + tempScore*0.25 + gridScore*0.25 + batchScore*0.15
197
+
198
+ grade.Details["grid_stress_steps"] = gridStressSteps
199
+ grade.Details["grid_response_steps"] = gridResponseSteps
200
+ grade.Details["total_jobs"] = totalJobs
201
+ grade.Details["completed_on_time"] = completedOnTime
202
+ return grade
203
+ }
go.mod ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ module gridmind-rl
2
+
3
+ go 1.21
main.go ADDED
@@ -0,0 +1,376 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // main.go — GridMind-RL HTTP server (OpenEnv-compliant)
2
+ // Exposes: POST /step, POST /reset, GET /state, GET /health, GET /replay, GET /grade, GET /metrics
3
+ // Port: 7860 (Hugging Face Spaces compatible)
4
+ package main
5
+
6
+ import (
7
+ "encoding/json"
8
+ "fmt"
9
+ "log"
10
+ "math"
11
+ "net/http"
12
+ "os"
13
+ "strconv"
14
+ "sync"
15
+ "sync/atomic"
16
+ "time"
17
+
18
+ "gridmind-rl/env"
19
+ )
20
+
21
+ // ──────────────────────────────────────────────
22
+ // Prometheus-style metrics (OpenTelemetry)
23
+ // ──────────────────────────────────────────────
24
+
25
+ type Metrics struct {
26
+ mu sync.Mutex
27
+ stepCount int64
28
+ stepLatencySum float64
29
+ stepLatencyCount int64
30
+ rewardSum float64
31
+ rewardCount int64
32
+ rewardMin float64
33
+ rewardMax float64
34
+ // Histograms
35
+ actionBuckets map[string]int64 // hvac bucket counts
36
+ errorCount int64
37
+ }
38
+
39
+ var metrics = &Metrics{
40
+ rewardMin: math.MaxFloat64,
41
+ rewardMax: -math.MaxFloat64,
42
+ actionBuckets: map[string]int64{"low": 0, "mid": 0, "high": 0},
43
+ }
44
+
45
+ func (m *Metrics) recordStep(latencyMs float64, reward float64) {
46
+ atomic.AddInt64(&m.stepCount, 1)
47
+ m.mu.Lock()
48
+ defer m.mu.Unlock()
49
+ m.stepLatencySum += latencyMs
50
+ m.stepLatencyCount++
51
+ m.rewardSum += reward
52
+ m.rewardCount++
53
+ if reward < m.rewardMin {
54
+ m.rewardMin = reward
55
+ }
56
+ if reward > m.rewardMax {
57
+ m.rewardMax = reward
58
+ }
59
+ }
60
+
61
+ func (m *Metrics) recordAction(hvac float64) {
62
+ m.mu.Lock()
63
+ defer m.mu.Unlock()
64
+ switch {
65
+ case hvac < 0.33:
66
+ m.actionBuckets["low"]++
67
+ case hvac < 0.66:
68
+ m.actionBuckets["mid"]++
69
+ default:
70
+ m.actionBuckets["high"]++
71
+ }
72
+ }
73
+
74
+ func (m *Metrics) prometheus() string {
75
+ m.mu.Lock()
76
+ defer m.mu.Unlock()
77
+ avgLatency := 0.0
78
+ if m.stepLatencyCount > 0 {
79
+ avgLatency = m.stepLatencySum / float64(m.stepLatencyCount)
80
+ }
81
+ avgReward := 0.0
82
+ if m.rewardCount > 0 {
83
+ avgReward = m.rewardSum / float64(m.rewardCount)
84
+ }
85
+ return fmt.Sprintf(`# HELP gridmind_steps_total Total environment steps taken
86
+ # TYPE gridmind_steps_total counter
87
+ gridmind_steps_total %d
88
+
89
+ # HELP gridmind_step_latency_ms_avg Average step latency (ms)
90
+ # TYPE gridmind_step_latency_ms_avg gauge
91
+ gridmind_step_latency_ms_avg %.4f
92
+
93
+ # HELP gridmind_reward_avg Average reward per step
94
+ # TYPE gridmind_reward_avg gauge
95
+ gridmind_reward_avg %.4f
96
+
97
+ # HELP gridmind_reward_min Minimum reward seen
98
+ # TYPE gridmind_reward_min gauge
99
+ gridmind_reward_min %.4f
100
+
101
+ # HELP gridmind_reward_max Maximum reward seen
102
+ # TYPE gridmind_reward_max gauge
103
+ gridmind_reward_max %.4f
104
+
105
+ # HELP gridmind_action_hvac_bucket HVAC power level distribution
106
+ # TYPE gridmind_action_hvac_bucket counter
107
+ gridmind_action_hvac_bucket{bin="low"} %d
108
+ gridmind_action_hvac_bucket{bin="mid"} %d
109
+ gridmind_action_hvac_bucket{bin="high"} %d
110
+
111
+ # HELP gridmind_errors_total Total request errors
112
+ # TYPE gridmind_errors_total counter
113
+ gridmind_errors_total %d
114
+ `,
115
+ atomic.LoadInt64(&m.stepCount),
116
+ avgLatency, avgReward,
117
+ m.rewardMin, m.rewardMax,
118
+ m.actionBuckets["low"], m.actionBuckets["mid"], m.actionBuckets["high"],
119
+ atomic.LoadInt64(&m.errorCount),
120
+ )
121
+ }
122
+
123
+ // ──────────────────────────────────────────────
124
+ // Server
125
+ // ──────────────────────────────────────────────
126
+
127
+ type Server struct {
128
+ envMgr *env.Environment
129
+ }
130
+
131
+ func newServer() *Server {
132
+ return &Server{envMgr: env.NewEnvironment()}
133
+ }
134
+
135
+ func (s *Server) routes() *http.ServeMux {
136
+ mux := http.NewServeMux()
137
+ mux.HandleFunc("/health", s.handleHealth)
138
+ mux.HandleFunc("/reset", s.handleReset)
139
+ mux.HandleFunc("/step", s.handleStep)
140
+ mux.HandleFunc("/state", s.handleState)
141
+ mux.HandleFunc("/replay", s.handleReplay)
142
+ mux.HandleFunc("/grade", s.handleGrade)
143
+ mux.HandleFunc("/tasks", s.handleTasks)
144
+ mux.HandleFunc("/metrics", s.handleMetrics)
145
+ return mux
146
+ }
147
+
148
+ // ── /health ──────────────────────────────────────────────────────────────────
149
+
150
+ func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
151
+ w.Header().Set("Content-Type", "application/json")
152
+ w.WriteHeader(http.StatusOK)
153
+ json.NewEncoder(w).Encode(map[string]string{"status": "ok", "version": "1.0.0"})
154
+ }
155
+
156
+ // ── /reset ───────────────────────────────────────────────────────────────────
157
+
158
+ func (s *Server) handleReset(w http.ResponseWriter, r *http.Request) {
159
+ if r.Method != http.MethodPost {
160
+ http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
161
+ return
162
+ }
163
+ var req env.ResetRequest
164
+ if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
165
+ // Allow empty body → defaults
166
+ req = env.ResetRequest{TaskID: 1}
167
+ }
168
+ if req.TaskID == 0 {
169
+ req.TaskID = 1
170
+ }
171
+ resp := s.envMgr.Reset(req)
172
+ w.Header().Set("Content-Type", "application/json")
173
+ json.NewEncoder(w).Encode(resp)
174
+ }
175
+
176
+ // ── /step ────────────────────────────────────────────────────────────────────
177
+
178
+ func (s *Server) handleStep(w http.ResponseWriter, r *http.Request) {
179
+ if r.Method != http.MethodPost {
180
+ http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
181
+ return
182
+ }
183
+
184
+ start := time.Now()
185
+
186
+ // Accept both single action and array of actions
187
+ var actions []env.ActionModel
188
+ var single env.ActionModel
189
+
190
+ body := make([]byte, 0, 512)
191
+ buf := make([]byte, 512)
192
+ for {
193
+ n, err := r.Body.Read(buf)
194
+ body = append(body, buf[:n]...)
195
+ if err != nil {
196
+ break
197
+ }
198
+ }
199
+
200
+ if len(body) > 0 && body[0] == '[' {
201
+ if err := json.Unmarshal(body, &actions); err != nil {
202
+ atomic.AddInt64(&metrics.errorCount, 1)
203
+ http.Error(w, "invalid action array: "+err.Error(), http.StatusBadRequest)
204
+ return
205
+ }
206
+ } else {
207
+ if err := json.Unmarshal(body, &single); err != nil {
208
+ atomic.AddInt64(&metrics.errorCount, 1)
209
+ http.Error(w, "invalid action: "+err.Error(), http.StatusBadRequest)
210
+ return
211
+ }
212
+ actions = []env.ActionModel{single}
213
+ }
214
+
215
+ responses, done := s.envMgr.Step(actions)
216
+
217
+ latency := float64(time.Since(start).Microseconds()) / 1000.0
218
+ for _, resp := range responses {
219
+ metrics.recordStep(latency, resp.Reward)
220
+ }
221
+ if len(actions) > 0 {
222
+ metrics.recordAction(actions[0].HVACPowerLevel)
223
+ }
224
+
225
+ w.Header().Set("Content-Type", "application/json")
226
+ if done && len(responses) == 1 {
227
+ responses[0].Done = true
228
+ }
229
+ // Return single response if single building, array otherwise
230
+ if len(responses) == 1 {
231
+ json.NewEncoder(w).Encode(responses[0])
232
+ } else {
233
+ json.NewEncoder(w).Encode(responses)
234
+ }
235
+ }
236
+
237
+ // ── /state ───────────────────────────────────────────────────────────────────
238
+
239
+ func (s *Server) handleState(w http.ResponseWriter, r *http.Request) {
240
+ if r.Method != http.MethodGet {
241
+ http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
242
+ return
243
+ }
244
+ state := s.envMgr.GetState()
245
+ // Add CORS for dashboard
246
+ w.Header().Set("Access-Control-Allow-Origin", "*")
247
+ w.Header().Set("Content-Type", "application/json")
248
+ json.NewEncoder(w).Encode(state)
249
+ }
250
+
251
+ // ── /replay ──────────────────────────────────────────────────────────────────
252
+
253
+ func (s *Server) handleReplay(w http.ResponseWriter, r *http.Request) {
254
+ if r.Method != http.MethodGet {
255
+ http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
256
+ return
257
+ }
258
+ replay := s.envMgr.GetReplay()
259
+ w.Header().Set("Content-Type", "application/json")
260
+ w.Header().Set("Access-Control-Allow-Origin", "*")
261
+ json.NewEncoder(w).Encode(map[string]interface{}{
262
+ "replay": replay,
263
+ "steps": len(replay),
264
+ })
265
+ }
266
+
267
+ // ── /grade ───────────────────────────────────────────────────────────────────
268
+
269
+ func (s *Server) handleGrade(w http.ResponseWriter, r *http.Request) {
270
+ state := s.envMgr.GetState()
271
+ replay := s.envMgr.GetReplay()
272
+
273
+ // Collect per-building exploit penalties
274
+ penalties := make([]float64, len(state.Buildings))
275
+ for i := range state.Buildings {
276
+ _, pen := s.envMgr.ExploitDetected(i)
277
+ penalties[i] = pen
278
+ }
279
+
280
+ // Build building states from public state
281
+ buildings := make([]*env.BuildingState, len(state.Buildings))
282
+ for i, pub := range state.Buildings {
283
+ buildings[i] = &env.BuildingState{
284
+ CumulativeCost: pub.CumulativeCost,
285
+ BaselineCost: pub.BaselineCost,
286
+ }
287
+ }
288
+
289
+ // Reconstruct temp history from public state
290
+ tempHistory := make([][]float64, len(state.Buildings))
291
+ for i, pub := range state.Buildings {
292
+ tempHistory[i] = pub.TempHistory
293
+ }
294
+
295
+ grade := env.GradeEpisode(env.GradeEpisodeInput{
296
+ TaskID: state.TaskID,
297
+ Buildings: buildings,
298
+ Replay: replay,
299
+ TempHistory: tempHistory,
300
+ TMin: env.TMinDefault,
301
+ TMax: env.TMaxDefault,
302
+ ExploitPenalties: penalties,
303
+ })
304
+
305
+ w.Header().Set("Content-Type", "application/json")
306
+ w.Header().Set("Access-Control-Allow-Origin", "*")
307
+ json.NewEncoder(w).Encode(grade)
308
+ }
309
+
310
+ // ── /tasks ───────────────────────────────────────────────────────────────────
311
+
312
+ func (s *Server) handleTasks(w http.ResponseWriter, r *http.Request) {
313
+ w.Header().Set("Content-Type", "application/json")
314
+ json.NewEncoder(w).Encode(env.AllTasks())
315
+ }
316
+
317
+ // ── /metrics ─────────────────────────────────────────────────────────────────
318
+
319
+ func (s *Server) handleMetrics(w http.ResponseWriter, r *http.Request) {
320
+ w.Header().Set("Content-Type", "text/plain; version=0.0.4")
321
+ fmt.Fprint(w, metrics.prometheus())
322
+ }
323
+
324
+ // ──────────────────────────────────────────────
325
+ // Entry point
326
+ // ──────────────────────────────────────────────
327
+
328
+ func main() {
329
+ port := os.Getenv("PORT")
330
+ if port == "" {
331
+ port = "7860"
332
+ }
333
+ // Validate port
334
+ if _, err := strconv.Atoi(port); err != nil {
335
+ log.Fatalf("invalid PORT: %s", port)
336
+ }
337
+
338
+ srv := newServer()
339
+
340
+ // Perform initial reset so /state is always valid
341
+ var seed int64 = 42
342
+ srv.envMgr.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
343
+
344
+ log.Printf("GridMind-RL environment server starting on :%s", port)
345
+ log.Printf("Endpoints: GET /health /state /replay /grade /tasks /metrics | POST /reset /step")
346
+
347
+ mux := withCORS(withLogging(srv.routes()))
348
+ if err := http.ListenAndServe(":"+port, mux); err != nil {
349
+ log.Fatalf("server error: %v", err)
350
+ }
351
+ }
352
+
353
+ // ──────────────────────────────────────────────
354
+ // Middleware
355
+ // ──────────────────────────────────────────────
356
+
357
+ func withLogging(next http.Handler) http.Handler {
358
+ return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
359
+ start := time.Now()
360
+ next.ServeHTTP(w, r)
361
+ log.Printf("%s %s %s", r.Method, r.URL.Path, time.Since(start))
362
+ })
363
+ }
364
+
365
+ func withCORS(next http.Handler) http.Handler {
366
+ return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
367
+ w.Header().Set("Access-Control-Allow-Origin", "*")
368
+ w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
369
+ w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
370
+ if r.Method == http.MethodOptions {
371
+ w.WriteHeader(http.StatusNoContent)
372
+ return
373
+ }
374
+ next.ServeHTTP(w, r)
375
+ })
376
+ }
openenv.yaml ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: gridmind-rl
2
+ version: 1.0.0
3
+ description: |
4
+ GridMind-RL: Industrial Load-Shaping and Demand-Response Environment.
5
+ An RL environment simulating a real-world building energy management system.
6
+ Control HVAC, thermal storage, and schedule batch jobs in response to
7
+ stochastic time-of-use prices and grid stress events.
8
+
9
+ author: GridMind Team
10
+ tags:
11
+ - reinforcement-learning
12
+ - energy
13
+ - demand-response
14
+ - continuous-control
15
+ - discrete-control
16
+
17
+ server:
18
+ host: "0.0.0.0"
19
+ port: 7860
20
+ protocol: http
21
+
22
+ schemas:
23
+ observation:
24
+ type: object
25
+ properties:
26
+ indoor_temperature:
27
+ type: number
28
+ description: Current building/process temperature (°C)
29
+ thermal_storage_level:
30
+ type: number
31
+ minimum: 0.0
32
+ maximum: 1.0
33
+ description: Thermal storage tank level (0.0=empty, 1.0=full)
34
+ process_demand:
35
+ type: number
36
+ description: Current process industrial power demand (kW)
37
+ current_price:
38
+ type: number
39
+ description: Real-time electricity price ($/kWh)
40
+ grid_stress_signal:
41
+ type: number
42
+ minimum: 0.0
43
+ maximum: 1.0
44
+ description: Utility signal for DR urgency (0=normal, 1=critical)
45
+ carbon_intensity:
46
+ type: number
47
+ description: Grid carbon intensity (gCO2/kWh)
48
+ hour_of_day:
49
+ type: integer
50
+ minimum: 0
51
+ maximum: 23
52
+ description: Hour of the day for time-aware scheduling
53
+ batch_queue:
54
+ type: array
55
+ items:
56
+ type: integer
57
+ description: List of pending batch jobs (values are deadline slot indices)
58
+ cumulative_cost:
59
+ type: number
60
+ description: Total energy cost incurred so far this episode ($)
61
+ step:
62
+ type: integer
63
+ minimum: 0
64
+ maximum: 95
65
+ description: Current episode step (15-min intervals)
66
+ building_id:
67
+ type: integer
68
+ description: Building identifier for multi-building federation
69
+
70
+ action:
71
+ type: object
72
+ required: ["hvac_power_level", "thermal_charge_rate", "batch_job_slot", "load_shed_fraction"]
73
+ properties:
74
+ hvac_power_level:
75
+ type: number
76
+ minimum: 0.0
77
+ maximum: 1.0
78
+ description: Fraction of max HVAC power to apply (0.0-1.0)
79
+ thermal_charge_rate:
80
+ type: number
81
+ minimum: -1.0
82
+ maximum: 1.0
83
+ description: Thermal storage charge (+) or discharge (-) rate
84
+ batch_job_slot:
85
+ type: integer
86
+ minimum: 0
87
+ maximum: 4
88
+ description: Which time slot to schedule the next batch job (0=now, 1-4=defer)
89
+ load_shed_fraction:
90
+ type: number
91
+ minimum: 0.0
92
+ maximum: 0.5
93
+ description: Fraction of non-critical load to shed during grid stress (0.0-0.5)
94
+ building_id:
95
+ type: integer
96
+ description: Building identifier for multi-building federation
97
+
98
+ reward:
99
+ type: number
100
+ description: Dense multi-component reward combining cost savings, temperature constraints, grid response, and deadlines.
101
+
102
+ tasks:
103
+ - id: 1
104
+ name: "Cost Minimization"
105
+ description: "Minimize total energy cost over a 24-hour episode with no process constraints."
106
+ difficulty: "easy"
107
+ - id: 2
108
+ name: "Constrained Temperature Management"
109
+ description: "Minimize cost while keeping indoor temperature within ±2°C of setpoint at all times."
110
+ difficulty: "medium"
111
+ - id: 3
112
+ name: "Full Demand-Response with Batch Scheduling"
113
+ description: "Minimize cost, maintain temperature, respond to grid stress events, schedule all batch jobs, and minimize carbon."
114
+ difficulty: "hard"
115
+
116
+ endpoints:
117
+ health:
118
+ path: /health
119
+ method: GET
120
+ reset:
121
+ path: /reset
122
+ method: POST
123
+ step:
124
+ path: /step
125
+ method: POST
126
+ state:
127
+ path: /state
128
+ method: GET
129
+ grade:
130
+ path: /grade
131
+ method: GET
132
+ replay:
133
+ path: /replay
134
+ method: GET
135
+ tasks:
136
+ path: /tasks
137
+ method: GET
python/inference.py ADDED
@@ -0,0 +1,357 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GridMind-RL Baseline Inference Script
3
+ --------------------------------------
4
+ Runs an LLM agent against all 3 tasks for N episodes each.
5
+ Uses OpenAI-compatible API via API_BASE_URL / MODEL_NAME / HF_TOKEN environment variables.
6
+
7
+ Usage:
8
+ export API_BASE_URL=https://api-inference.huggingface.co/v1
9
+ export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct
10
+ export HF_TOKEN=hf_xxxx
11
+ python python/inference.py [--episodes 3] [--env-url http://localhost:7860]
12
+ """
13
+
14
+ import argparse
15
+ import json
16
+ import os
17
+ import random
18
+ import re
19
+ import sys
20
+ import time
21
+ from typing import Any
22
+
23
+ import requests
24
+ from openai import OpenAI
25
+
26
+ # ── Constants ──────────────────────────────────────────────────────────────
27
+
28
+ ENV_URL = os.getenv("ENV_URL", "http://localhost:7860")
29
+ API_BASE_URL = os.getenv("API_BASE_URL", "https://api-inference.huggingface.co/v1")
30
+ MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.1-8B-Instruct")
31
+ HF_TOKEN = os.getenv("HF_TOKEN", "")
32
+ DEFAULT_EPISODES = 3
33
+ DEFAULT_SEED_BASE = 1000 # episodes use seed BASE+episode_idx for reproducibility
34
+ MAX_RETRIES = 3
35
+
36
+ SYSPROMPT = """You are GridMind, an expert industrial energy management controller.
37
+ You control a building's HVAC, thermal storage, batch job scheduling, and load shedding.
38
+ Your goal is to minimize electricity costs while maintaining comfort and meeting grid demand-response signals.
39
+ Always respond with a single valid JSON object matching the action schema. No explanation needed."""
40
+
41
+ TASK_DESCRIPTIONS = {
42
+ 1: "Task 1 (Easy - Cost Minimization): Minimize total energy cost over 24 hours. No temperature constraints. Use cheap off-peak periods and thermal storage arbitrage.",
43
+ 2: "Task 2 (Medium - Temperature Management): Minimize cost AND keep indoor temperature within 19-23°C at all times. Balance comfort vs cost.",
44
+ 3: "Task 3 (Hard - Full Demand Response): Minimize cost, maintain temperature, respond to grid stress events by shedding load when grid_stress_signal > 0.7, AND schedule all batch jobs before their deadlines.",
45
+ }
46
+
47
+ ACTION_SCHEMA_STR = """{
48
+ "hvac_power_level": <float 0.0-1.0>,
49
+ "thermal_charge_rate": <float -1.0 to 1.0>,
50
+ "batch_job_slot": <int 0-4>,
51
+ "load_shed_fraction": <float 0.0-0.5>,
52
+ "building_id": 0
53
+ }"""
54
+
55
+
56
+ # ── Environment client ───────────────────────────────────────────────────────
57
+
58
+ class GridMindEnvClient:
59
+ """Simple HTTP client for the GridMind-RL Go environment server."""
60
+
61
+ def __init__(self, base_url: str = ENV_URL, timeout: int = 30):
62
+ self.base = base_url.rstrip("/")
63
+ self.timeout = timeout
64
+
65
+ def health(self) -> bool:
66
+ try:
67
+ r = requests.get(f"{self.base}/health", timeout=5)
68
+ return r.status_code == 200
69
+ except Exception:
70
+ return False
71
+
72
+ def reset(self, task_id: int = 1, seed: int = 42, num_buildings: int = 1) -> dict:
73
+ payload = {"task_id": task_id, "seed": seed, "num_buildings": num_buildings}
74
+ r = requests.post(f"{self.base}/reset", json=payload, timeout=self.timeout)
75
+ r.raise_for_status()
76
+ return r.json()
77
+
78
+ def step(self, action: dict) -> dict:
79
+ r = requests.post(f"{self.base}/step", json=action, timeout=self.timeout)
80
+ r.raise_for_status()
81
+ return r.json()
82
+
83
+ def grade(self) -> dict:
84
+ r = requests.get(f"{self.base}/grade", timeout=self.timeout)
85
+ r.raise_for_status()
86
+ return r.json()
87
+
88
+ def state(self) -> dict:
89
+ r = requests.get(f"{self.base}/state", timeout=self.timeout)
90
+ r.raise_for_status()
91
+ return r.json()
92
+
93
+
94
+ # ── LLM agent ───────────────────────────────────────────────────────────────
95
+
96
+ class LLMAgent:
97
+ """OpenAI-compatible LLM agent that chooses actions given observations."""
98
+
99
+ def __init__(self):
100
+ self.client = OpenAI(
101
+ base_url=API_BASE_URL,
102
+ api_key=HF_TOKEN if HF_TOKEN else "none",
103
+ )
104
+ self.model = MODEL_NAME
105
+
106
+ def choose_action(self, obs: dict, task_id: int) -> dict:
107
+ """Prompt the LLM with current observation, return parsed action dict."""
108
+ task_desc = TASK_DESCRIPTIONS.get(task_id, TASK_DESCRIPTIONS[1])
109
+
110
+ prompt = f"""{task_desc}
111
+
112
+ Current observation:
113
+ - Indoor temperature: {obs.get('indoor_temperature', 21):.1f}°C (target: 21°C, bounds: 19-23°C)
114
+ - Thermal storage level: {obs.get('thermal_storage_level', 0.5):.2f} (0=empty, 1=full)
115
+ - Process demand: {obs.get('process_demand', 15):.1f} kW
116
+ - Current electricity price: ${obs.get('current_price', 0.10):.4f}/kWh
117
+ - Grid stress signal: {obs.get('grid_stress_signal', 0):.3f} (>0.7 = critical, shed load!)
118
+ - Carbon intensity: {obs.get('carbon_intensity', 300):.0f} gCO2/kWh
119
+ - Hour of day: {obs.get('hour_of_day', 12)} (0=midnight, peak prices 8-12 and 17-21)
120
+ - Pending batch job deadlines: {obs.get('batch_queue', [])}
121
+ - Cumulative cost so far: ${obs.get('cumulative_cost', 0):.4f}
122
+ - Episode step: {obs.get('step', 0)}/95
123
+
124
+ Strategy hints:
125
+ - Charge thermal storage when price < $0.08/kWh, discharge when price > $0.15/kWh
126
+ - Set HVAC low during peak prices (0.3-0.4) and use storage for temperature control
127
+ - Shed 30-50% load if grid_stress_signal > 0.7
128
+ - Schedule batch jobs early if deadline is close (slot 0 or 1)
129
+
130
+ Respond with ONLY a JSON action:
131
+ {ACTION_SCHEMA_STR}"""
132
+
133
+ for attempt in range(MAX_RETRIES):
134
+ try:
135
+ completion = self.client.chat.completions.create(
136
+ model=self.model,
137
+ messages=[
138
+ {"role": "system", "content": SYSPROMPT},
139
+ {"role": "user", "content": prompt},
140
+ ],
141
+ max_tokens=128,
142
+ temperature=0.1,
143
+ )
144
+ content = completion.choices[0].message.content.strip()
145
+ return self._parse_action(content)
146
+ except Exception as e:
147
+ print(f" [LLM attempt {attempt+1}/{MAX_RETRIES}] error: {e}")
148
+ time.sleep(1)
149
+
150
+ # Fallback: rule-based heuristic
151
+ return self._heuristic_action(obs)
152
+
153
+ def _parse_action(self, content: str) -> dict:
154
+ """Extract and validate JSON action from LLM response."""
155
+ # Try direct JSON parse
156
+ try:
157
+ action = json.loads(content)
158
+ return self._clamp_action(action)
159
+ except json.JSONDecodeError:
160
+ pass
161
+ # Try to extract JSON block from text
162
+ match = re.search(r"\{[^}]+\}", content, re.DOTALL)
163
+ if match:
164
+ try:
165
+ action = json.loads(match.group())
166
+ return self._clamp_action(action)
167
+ except json.JSONDecodeError:
168
+ pass
169
+ # Fallback
170
+ print(f" [WARN] could not parse LLM response: {content[:100]}")
171
+ return self._default_action()
172
+
173
+ def _clamp_action(self, action: dict) -> dict:
174
+ return {
175
+ "hvac_power_level": max(0.0, min(1.0, float(action.get("hvac_power_level", 0.5)))),
176
+ "thermal_charge_rate": max(-1.0, min(1.0, float(action.get("thermal_charge_rate", 0.0)))),
177
+ "batch_job_slot": max(0, min(4, int(action.get("batch_job_slot", 0)))),
178
+ "load_shed_fraction": max(0.0, min(0.5, float(action.get("load_shed_fraction", 0.0)))),
179
+ "building_id": int(action.get("building_id", 0)),
180
+ }
181
+
182
+ def _heuristic_action(self, obs: dict) -> dict:
183
+ """Simple rule-based heuristic when LLM is unavailable."""
184
+ price = obs.get("current_price", 0.10)
185
+ stress = obs.get("grid_stress_signal", 0.0)
186
+ temp = obs.get("indoor_temperature", 21.0)
187
+ storage = obs.get("thermal_storage_level", 0.5)
188
+ queue = obs.get("batch_queue", [])
189
+
190
+ # HVAC: reduce during peak
191
+ hvac = 0.7 if price < 0.08 else (0.3 if price > 0.15 else 0.5)
192
+ # Adjust for temperature
193
+ if temp > 23.0:
194
+ hvac = max(hvac, 0.8)
195
+ elif temp < 19.0:
196
+ hvac = min(hvac, 0.2)
197
+
198
+ # Storage arbitrage
199
+ charge = 0.0
200
+ if price < 0.07 and storage < 0.8:
201
+ charge = 0.5
202
+ elif price > 0.15 and storage > 0.3:
203
+ charge = -0.5
204
+
205
+ # Load shedding
206
+ shed = 0.0
207
+ if stress > 0.7:
208
+ shed = 0.4
209
+ elif stress > 0.5:
210
+ shed = 0.2
211
+
212
+ # Batch jobs: schedule soon if deadline approaching
213
+ slot = 2
214
+ if queue and min(queue) < 10:
215
+ slot = 0
216
+
217
+ return {
218
+ "hvac_power_level": hvac,
219
+ "thermal_charge_rate": charge,
220
+ "batch_job_slot": slot,
221
+ "load_shed_fraction": shed,
222
+ "building_id": 0,
223
+ }
224
+
225
+ def _default_action(self) -> dict:
226
+ return {"hvac_power_level": 0.5, "thermal_charge_rate": 0.0,
227
+ "batch_job_slot": 0, "load_shed_fraction": 0.0, "building_id": 0}
228
+
229
+
230
+ # ── Episode runner ───────────────────────────────────────────────────────────
231
+
232
+ def run_episode(env_client: GridMindEnvClient, agent: LLMAgent,
233
+ task_id: int, seed: int, verbose: bool = False) -> dict[str, Any]:
234
+ """Run a single episode and return grade + metadata."""
235
+ reset_resp = env_client.reset(task_id=task_id, seed=seed)
236
+ obs = reset_resp["observations"][0]
237
+
238
+ total_reward = 0.0
239
+ total_steps = 0
240
+ start_time = time.time()
241
+
242
+ for _step in range(96):
243
+ action = agent.choose_action(obs, task_id)
244
+ step_resp = env_client.step(action)
245
+
246
+ obs = step_resp["observation"]
247
+ total_reward += step_resp["reward"]
248
+ total_steps += 1
249
+
250
+ if verbose and _step % 16 == 0:
251
+ print(f" step={_step:02d} price=${obs['current_price']:.3f} "
252
+ f"temp={obs['indoor_temperature']:.1f}°C "
253
+ f"stress={obs['grid_stress_signal']:.2f} "
254
+ f"cost=${obs['cumulative_cost']:.2f} "
255
+ f"reward={step_resp['reward']:.3f}")
256
+
257
+ if step_resp.get("done", False):
258
+ break
259
+
260
+ elapsed = time.time() - start_time
261
+ grade = env_client.grade()
262
+
263
+ return {
264
+ "task_id": task_id,
265
+ "seed": seed,
266
+ "total_reward": total_reward,
267
+ "total_steps": total_steps,
268
+ "elapsed_sec": elapsed,
269
+ "score": grade.get("score", 0.0),
270
+ "sub_scores": grade.get("sub_scores", {}),
271
+ "exploit_detected": grade.get("exploit_detected", False),
272
+ }
273
+
274
+
275
+ # ── Main ─────────────────────────────────────────────────────────────────────
276
+
277
+ def main():
278
+ parser = argparse.ArgumentParser(description="GridMind-RL baseline inference")
279
+ parser.add_argument("--episodes", type=int, default=DEFAULT_EPISODES)
280
+ parser.add_argument("--env-url", type=str, default=ENV_URL)
281
+ parser.add_argument("--verbose", action="store_true")
282
+ parser.add_argument("--output", type=str, default="baseline_scores.json")
283
+ args = parser.parse_args()
284
+
285
+ print("=" * 60)
286
+ print("GridMind-RL Baseline Inference")
287
+ print(f" Model: {MODEL_NAME}")
288
+ print(f" API: {API_BASE_URL}")
289
+ print(f" Env: {args.env_url}")
290
+ print(f" Episodes per task: {args.episodes}")
291
+ print("=" * 60)
292
+
293
+ env_client = GridMindEnvClient(base_url=args.env_url)
294
+
295
+ # Wait for env server to be healthy
296
+ print("\nWaiting for environment server...")
297
+ for attempt in range(30):
298
+ if env_client.health():
299
+ print(" ✓ Environment server is healthy")
300
+ break
301
+ time.sleep(2)
302
+ if attempt == 29:
303
+ print(" ✗ Environment server not reachable. Exiting.")
304
+ sys.exit(1)
305
+
306
+ agent = LLMAgent()
307
+ all_results = []
308
+
309
+ for task_id in [1, 2, 3]:
310
+ print(f"\n── Task {task_id}: {TASK_DESCRIPTIONS[task_id][:60]}...")
311
+ task_scores = []
312
+ for ep in range(args.episodes):
313
+ seed = DEFAULT_SEED_BASE + task_id * 100 + ep
314
+ print(f" Episode {ep+1}/{args.episodes} (seed={seed})")
315
+ result = run_episode(env_client, agent, task_id=task_id, seed=seed, verbose=args.verbose)
316
+ task_scores.append(result["score"])
317
+ all_results.append(result)
318
+ print(f" → score={result['score']:.4f} | reward={result['total_reward']:.3f} | {result['elapsed_sec']:.1f}s")
319
+
320
+ avg_score = sum(task_scores) / len(task_scores)
321
+ print(f" Task {task_id} average score: {avg_score:.4f}")
322
+
323
+ # Score summary table
324
+ print("\n" + "=" * 60)
325
+ print("BASELINE SCORES SUMMARY")
326
+ print("=" * 60)
327
+ print(f"{'Task':<10} {'Model':<30} {'Score':<10} {'Episodes':<10}")
328
+ print("-" * 60)
329
+
330
+ task_avgs = {}
331
+ for task_id in [1, 2, 3]:
332
+ scores = [r["score"] for r in all_results if r["task_id"] == task_id]
333
+ avg = sum(scores) / len(scores) if scores else 0.0
334
+ task_avgs[task_id] = avg
335
+ print(f"Task {task_id:<6} {MODEL_NAME:<30} {avg:<10.4f} {len(scores)}")
336
+
337
+ print("-" * 60)
338
+ overall = sum(task_avgs.values()) / len(task_avgs)
339
+ print(f"{'Overall':<10} {'':<30} {overall:<10.4f}")
340
+
341
+ # Save results
342
+ output = {
343
+ "model": MODEL_NAME,
344
+ "api_base": API_BASE_URL,
345
+ "episodes_per_task": args.episodes,
346
+ "seed_base": DEFAULT_SEED_BASE,
347
+ "task_averages": {str(k): v for k, v in task_avgs.items()},
348
+ "overall_average": overall,
349
+ "all_results": all_results,
350
+ }
351
+ with open(args.output, "w") as f:
352
+ json.dump(output, f, indent=2)
353
+ print(f"\n✓ Results saved to {args.output}")
354
+
355
+
356
+ if __name__ == "__main__":
357
+ main()
python/models.py ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GridMind-RL OpenEnv Pydantic models.
3
+ These types mirror the Go structs exactly for full schema compliance.
4
+ """
5
+ from __future__ import annotations
6
+ from typing import List, Optional, Dict, Any
7
+ from pydantic import BaseModel, Field, field_validator
8
+
9
+
10
+ class BatchJob(BaseModel):
11
+ id: int
12
+ deadline_slot: int
13
+ duration: int
14
+ power_draw: float
15
+ scheduled: bool
16
+ scheduled_at: int
17
+ completed: bool
18
+ missed_deadline: bool
19
+
20
+
21
+ class ObservationModel(BaseModel):
22
+ """Full observation returned on each step / GET /state."""
23
+ indoor_temperature: float = Field(..., description="Current building indoor temperature (°C)")
24
+ thermal_storage_level: float = Field(..., ge=0.0, le=1.0, description="Thermal storage fill level (0–1)")
25
+ process_demand: float = Field(..., ge=0.0, description="Current process power demand (kW)")
26
+ current_price: float = Field(..., gt=0.0, description="Real-time electricity price ($/kWh)")
27
+ grid_stress_signal: float = Field(..., ge=0.0, le=1.0, description="Utility demand-response urgency (0–1)")
28
+ carbon_intensity: float = Field(..., ge=0.0, description="Grid carbon intensity (gCO2/kWh)")
29
+ hour_of_day: int = Field(..., ge=0, le=23, description="Current hour of day (0–23)")
30
+ batch_queue: List[int] = Field(default_factory=list, description="Deadline slots of pending batch jobs")
31
+ cumulative_cost: float = Field(..., ge=0.0, description="Running energy cost this episode ($)")
32
+ step: int = Field(..., ge=0, description="Current timestep (0–95)")
33
+ building_id: int = Field(default=0, description="Building index in federation")
34
+
35
+
36
+ class ActionModel(BaseModel):
37
+ """Agent action for a single timestep."""
38
+ hvac_power_level: float = Field(..., ge=0.0, le=1.0, description="HVAC fraction of max power (0–1)")
39
+ thermal_charge_rate: float = Field(..., ge=-1.0, le=1.0, description="Storage charge (+) or discharge (-) rate")
40
+ batch_job_slot: int = Field(..., ge=0, le=4, description="Time slot offset for next batch job (0=now, 1–4=defer)")
41
+ load_shed_fraction: float = Field(..., ge=0.0, le=0.5, description="Fraction of non-critical load to shed (0–0.5)")
42
+ building_id: int = Field(default=0, description="Building index this action targets")
43
+
44
+ @field_validator("hvac_power_level")
45
+ @classmethod
46
+ def clamp_hvac(cls, v: float) -> float:
47
+ return max(0.0, min(1.0, v))
48
+
49
+ @field_validator("thermal_charge_rate")
50
+ @classmethod
51
+ def clamp_charge(cls, v: float) -> float:
52
+ return max(-1.0, min(1.0, v))
53
+
54
+ @field_validator("load_shed_fraction")
55
+ @classmethod
56
+ def clamp_shed(cls, v: float) -> float:
57
+ return max(0.0, min(0.5, v))
58
+
59
+
60
+ class RewardComponents(BaseModel):
61
+ """Individual reward signal components."""
62
+ cost_savings: float = Field(..., description="Negative reward for energy cost")
63
+ temp_constraint: float = Field(..., description="Positive if temperature within bounds")
64
+ grid_response: float = Field(..., description="Bonus for shedding during high grid stress")
65
+ deadline_penalty: float = Field(..., description="Negative for missed batch deadlines")
66
+ efficiency_bonus: float = Field(..., description="Storage arbitrage bonus")
67
+ stability_penalty: float = Field(..., description="Penalty for rapid HVAC oscillation")
68
+ carbon_reward: float = Field(..., description="Low-carbon operation bonus")
69
+ total: float = Field(..., description="Weighted sum of all components")
70
+
71
+
72
+ class StepInfo(BaseModel):
73
+ """Auxiliary information returned at each step."""
74
+ reward_components: RewardComponents
75
+ energy_used_kwh: float
76
+ carbon_emitted_gco2: float
77
+ price_signal: float
78
+ grid_stress: float
79
+ batch_completed: List[int] = Field(default_factory=list)
80
+ batch_missed: List[int] = Field(default_factory=list)
81
+ episode: int
82
+ step: int
83
+
84
+
85
+ class StepResponse(BaseModel):
86
+ """Full response from POST /step."""
87
+ observation: ObservationModel
88
+ reward: float
89
+ done: bool
90
+ info: StepInfo
91
+
92
+
93
+ class ResetRequest(BaseModel):
94
+ """Request body for POST /reset."""
95
+ seed: Optional[int] = Field(None, description="Random seed for reproducibility")
96
+ task_id: int = Field(1, ge=1, le=3, description="Task to run (1=easy, 2=medium, 3=hard)")
97
+ difficulty: Optional[str] = Field(None, description="Override difficulty: easy/medium/hard")
98
+ num_buildings: int = Field(1, ge=1, le=3, description="Number of buildings in federation")
99
+
100
+
101
+ class ResetResponse(BaseModel):
102
+ """Response from POST /reset."""
103
+ observations: List[ObservationModel]
104
+ episode: int
105
+ task_id: int
106
+ seed: int
107
+
108
+
109
+ class BuildingStatePublic(BaseModel):
110
+ """Full building state including history for dashboard rendering."""
111
+ # ObservationModel fields (flattened)
112
+ indoor_temperature: float
113
+ thermal_storage_level: float
114
+ process_demand: float
115
+ current_price: float
116
+ grid_stress_signal: float
117
+ carbon_intensity: float
118
+ hour_of_day: int
119
+ batch_queue: List[int] = Field(default_factory=list)
120
+ cumulative_cost: float
121
+ step: int
122
+ building_id: int
123
+ # Extended state
124
+ outdoor_temperature: float
125
+ setpoint_temperature: float
126
+ baseline_cost: float
127
+ cumulative_carbon: float
128
+ jobs: List[BatchJob] = Field(default_factory=list)
129
+ # History arrays
130
+ temp_history: List[float] = Field(default_factory=list)
131
+ cost_history: List[float] = Field(default_factory=list)
132
+ hvac_history: List[float] = Field(default_factory=list)
133
+ load_shed_history: List[float] = Field(default_factory=list)
134
+ reward_history: List[RewardComponents] = Field(default_factory=list)
135
+
136
+
137
+ class StateResponse(BaseModel):
138
+ """Full environment state from GET /state."""
139
+ buildings: List[BuildingStatePublic]
140
+ price_curve_24h: List[float]
141
+ carbon_curve_24h: List[float]
142
+ episode: int
143
+ step: int
144
+ task_id: int
145
+ done: bool
146
+ seed: int
147
+
148
+
149
+ class TaskConfig(BaseModel):
150
+ """Task configuration."""
151
+ id: int
152
+ name: str
153
+ description: str
154
+ difficulty: str
155
+ weights: Dict[str, float]
156
+
157
+
158
+ class EpisodeGrade(BaseModel):
159
+ """Graded episode result."""
160
+ task_id: int
161
+ score: float = Field(..., ge=0.0, le=1.0)
162
+ sub_scores: Dict[str, float]
163
+ exploit_detected: bool
164
+ penalty_applied: float
165
+ details: Dict[str, Any]
166
+
167
+
168
+ # ── Action space schema (for LLM prompting) ────────────────────────────────
169
+ ACTION_SCHEMA = {
170
+ "type": "object",
171
+ "properties": {
172
+ "hvac_power_level": {
173
+ "type": "number",
174
+ "minimum": 0.0,
175
+ "maximum": 1.0,
176
+ "description": "Fraction of max HVAC power (0=off, 1=full power)"
177
+ },
178
+ "thermal_charge_rate": {
179
+ "type": "number",
180
+ "minimum": -1.0,
181
+ "maximum": 1.0,
182
+ "description": "Charge (+) or discharge (-) thermal storage at this fraction of max rate"
183
+ },
184
+ "batch_job_slot": {
185
+ "type": "integer",
186
+ "minimum": 0,
187
+ "maximum": 4,
188
+ "description": "Schedule next batch job: 0=run now, 1-4=defer by N 15-min intervals"
189
+ },
190
+ "load_shed_fraction": {
191
+ "type": "number",
192
+ "minimum": 0.0,
193
+ "maximum": 0.5,
194
+ "description": "Fraction of non-critical load to shed during this step (0=no shedding)"
195
+ },
196
+ "building_id": {
197
+ "type": "integer",
198
+ "minimum": 0,
199
+ "description": "Which building to apply this action to (0 for single-building mode)"
200
+ }
201
+ },
202
+ "required": ["hvac_power_level", "thermal_charge_rate", "batch_job_slot", "load_shed_fraction"]
203
+ }
204
+
205
+ # ── Observation space schema ───────────────────────────────────────────────
206
+ OBSERVATION_SCHEMA = {
207
+ "type": "object",
208
+ "properties": {
209
+ "indoor_temperature": {"type": "number", "description": "Indoor temperature °C"},
210
+ "thermal_storage_level": {"type": "number", "minimum": 0, "maximum": 1},
211
+ "process_demand": {"type": "number", "description": "Process power demand kW"},
212
+ "current_price": {"type": "number", "description": "Electricity price $/kWh"},
213
+ "grid_stress_signal": {"type": "number", "minimum": 0, "maximum": 1},
214
+ "carbon_intensity": {"type": "number", "description": "Grid carbon intensity gCO2/kWh"},
215
+ "hour_of_day": {"type": "integer", "minimum": 0, "maximum": 23},
216
+ "batch_queue": {"type": "array", "items": {"type": "integer"}},
217
+ "cumulative_cost": {"type": "number"},
218
+ "step": {"type": "integer"},
219
+ "building_id": {"type": "integer"}
220
+ }
221
+ }
python/requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ openai>=1.0.0
2
+ fastapi>=0.100.0
3
+ uvicorn>=0.23.0
4
+ pydantic>=2.0.0
5
+ requests>=2.31.0
6
+ httpx>=0.24.0
7
+ pytest>=7.0.0
python/validate.py ADDED
@@ -0,0 +1,258 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GridMind-RL Pre-Submission Validator
3
+ --------------------------------------
4
+ Validates the Go environment server against all OpenEnv spec requirements.
5
+ Run with: python python/validate.py [--env-url http://localhost:7860]
6
+ """
7
+
8
+ import argparse
9
+ import json
10
+ import sys
11
+ import time
12
+ import traceback
13
+ from typing import Any
14
+
15
+ import requests
16
+
17
+ ENV_URL = "http://localhost:7860"
18
+
19
+ PASS = "✓"
20
+ FAIL = "✗"
21
+ WARN = "⚠"
22
+
23
+
24
+ def check(label: str, condition: bool, detail: str = "") -> bool:
25
+ icon = PASS if condition else FAIL
26
+ line = f" {icon} {label}"
27
+ if detail:
28
+ line += f" — {detail}"
29
+ print(line)
30
+ return condition
31
+
32
+
33
+ def get(url: str, timeout: int = 10) -> requests.Response:
34
+ return requests.get(url, timeout=timeout)
35
+
36
+
37
+ def post(url: str, payload: Any = None, timeout: int = 10) -> requests.Response:
38
+ return requests.post(url, json=payload, timeout=timeout)
39
+
40
+
41
+ def validate(env_url: str) -> bool:
42
+ base = env_url.rstrip("/")
43
+ results = []
44
+
45
+ print("\n══════════════════════════════════════════")
46
+ print(" GridMind-RL OpenEnv Validation Report")
47
+ print("══════════════════════════════════════════\n")
48
+
49
+ # ── 1. Health endpoint ──────────────────────────────────────────────────
50
+ print("1. Health Endpoint")
51
+ try:
52
+ r = get(f"{base}/health")
53
+ results.append(check("GET /health returns 200", r.status_code == 200, f"got {r.status_code}"))
54
+ data = r.json()
55
+ results.append(check("Response has 'status' field", "status" in data))
56
+ except Exception as e:
57
+ results.append(check("GET /health reachable", False, str(e)))
58
+ print(f"\n ✗ Cannot reach server at {base}. Is it running?\n")
59
+ return False
60
+
61
+ # ── 2. Reset endpoint ───────────────────────────────────────────────────
62
+ print("\n2. Reset Endpoint")
63
+ reset_resp = None
64
+ try:
65
+ r = post(f"{base}/reset", {"task_id": 1, "seed": 42, "num_buildings": 1})
66
+ results.append(check("POST /reset returns 200", r.status_code == 200, f"got {r.status_code}"))
67
+ reset_resp = r.json()
68
+ results.append(check("Response has 'observations'", "observations" in reset_resp))
69
+ results.append(check("Response has 'episode'", "episode" in reset_resp))
70
+ results.append(check("Response has 'seed'", "seed" in reset_resp))
71
+ results.append(check("Response has 'task_id'", "task_id" in reset_resp))
72
+
73
+ obs_list = reset_resp.get("observations", [])
74
+ results.append(check("observations is a list", isinstance(obs_list, list)))
75
+ results.append(check("At least 1 observation returned", len(obs_list) >= 1))
76
+
77
+ if obs_list:
78
+ obs = obs_list[0]
79
+ obs_fields = ["indoor_temperature", "thermal_storage_level", "process_demand",
80
+ "current_price", "grid_stress_signal", "carbon_intensity",
81
+ "hour_of_day", "batch_queue", "cumulative_cost", "step"]
82
+ for field in obs_fields:
83
+ results.append(check(f"obs has '{field}'", field in obs))
84
+
85
+ # Seed reproducibility
86
+ r2 = post(f"{base}/reset", {"task_id": 1, "seed": 42})
87
+ d2 = r2.json()
88
+ obs1 = reset_resp.get("observations", [{}])[0]
89
+ obs2 = d2.get("observations", [{}])[0]
90
+ same = (abs(obs1.get("indoor_temperature", 0) - obs2.get("indoor_temperature", 0)) < 1e-6)
91
+ results.append(check("Same seed produces same initial obs", same))
92
+ except Exception as e:
93
+ results.append(check("POST /reset succeeds", False, str(e)))
94
+ traceback.print_exc()
95
+
96
+ # ── 3. Step endpoint ────────────────────────────────────────────────────
97
+ print("\n3. Step Endpoint")
98
+ try:
99
+ # Reset fresh
100
+ post(f"{base}/reset", {"task_id": 1, "seed": 100})
101
+ action = {
102
+ "hvac_power_level": 0.5,
103
+ "thermal_charge_rate": 0.1,
104
+ "batch_job_slot": 1,
105
+ "load_shed_fraction": 0.0,
106
+ "building_id": 0,
107
+ }
108
+ r = post(f"{base}/step", action)
109
+ results.append(check("POST /step returns 200", r.status_code == 200))
110
+ step_resp = r.json()
111
+
112
+ step_fields = ["observation", "reward", "done", "info"]
113
+ for f in step_fields:
114
+ results.append(check(f"step response has '{f}'", f in step_resp))
115
+
116
+ results.append(check("reward is numeric", isinstance(step_resp.get("reward"), (int, float))))
117
+ results.append(check("done is boolean", isinstance(step_resp.get("done"), bool)))
118
+
119
+ info = step_resp.get("info", {})
120
+ results.append(check("info has 'reward_components'", "reward_components" in info))
121
+ results.append(check("info has 'energy_used_kwh'", "energy_used_kwh" in info))
122
+
123
+ rc = info.get("reward_components", {})
124
+ rc_fields = ["cost_savings", "temp_constraint", "grid_response",
125
+ "deadline_penalty", "efficiency_bonus", "stability_penalty", "total"]
126
+ for f in rc_fields:
127
+ results.append(check(f"reward_components has '{f}'", f in rc))
128
+
129
+ # Test array action format
130
+ r2 = post(f"{base}/step", [action])
131
+ results.append(check("POST /step accepts array of actions", r2.status_code == 200))
132
+ except Exception as e:
133
+ results.append(check("POST /step succeeds", False, str(e)))
134
+ traceback.print_exc()
135
+
136
+ # ── 4. State endpoint ───────────────────────────────────────────────────
137
+ print("\n4. State Endpoint")
138
+ try:
139
+ r = get(f"{base}/state")
140
+ results.append(check("GET /state returns 200", r.status_code == 200))
141
+ state = r.json()
142
+ state_fields = ["buildings", "price_curve_24h", "carbon_curve_24h",
143
+ "episode", "step", "task_id", "done", "seed"]
144
+ for f in state_fields:
145
+ results.append(check(f"state has '{f}'", f in state))
146
+ results.append(check("price_curve_24h has 24 entries",
147
+ len(state.get("price_curve_24h", [])) == 24))
148
+ results.append(check("carbon_curve_24h has 24 entries",
149
+ len(state.get("carbon_curve_24h", [])) == 24))
150
+ except Exception as e:
151
+ results.append(check("GET /state succeeds", False, str(e)))
152
+
153
+ # ── 5. Replay endpoint ──────────────────────────────────────────────────
154
+ print("\n5. Replay Endpoint")
155
+ try:
156
+ r = get(f"{base}/replay")
157
+ results.append(check("GET /replay returns 200", r.status_code == 200))
158
+ replay = r.json()
159
+ results.append(check("response has 'replay' list", "replay" in replay))
160
+ results.append(check("response has 'steps' count", "steps" in replay))
161
+ except Exception as e:
162
+ results.append(check("GET /replay succeeds", False, str(e)))
163
+
164
+ # ── 6. Grade endpoint ───────────────────────────────────────────────────
165
+ print("\n6. Grade Endpoint")
166
+ try:
167
+ # Run quick 10-step episode
168
+ post(f"{base}/reset", {"task_id": 1, "seed": 777})
169
+ action = {"hvac_power_level": 0.3, "thermal_charge_rate": 0.0,
170
+ "batch_job_slot": 0, "load_shed_fraction": 0.0}
171
+ for _ in range(10):
172
+ post(f"{base}/step", action)
173
+ r = get(f"{base}/grade")
174
+ results.append(check("GET /grade returns 200", r.status_code == 200))
175
+ grade = r.json()
176
+ grade_fields = ["task_id", "score", "sub_scores", "exploit_detected"]
177
+ for f in grade_fields:
178
+ results.append(check(f"grade has '{f}'", f in grade))
179
+ score = grade.get("score", -1)
180
+ results.append(check("score in [0.0, 1.0]", 0.0 <= score <= 1.0, f"score={score:.4f}"))
181
+ except Exception as e:
182
+ results.append(check("GET /grade succeeds", False, str(e)))
183
+
184
+ # ── 7. Tasks endpoint ───────────────────────────────────────────────────
185
+ print("\n7. Tasks Endpoint")
186
+ try:
187
+ r = get(f"{base}/tasks")
188
+ results.append(check("GET /tasks returns 200", r.status_code == 200))
189
+ tasks = r.json()
190
+ results.append(check("returns list of 3 tasks", len(tasks) == 3))
191
+ task_fields = ["id", "name", "description", "difficulty", "weights"]
192
+ for f in task_fields:
193
+ results.append(check(f"task has '{f}'", f in tasks[0]))
194
+ except Exception as e:
195
+ results.append(check("GET /tasks succeeds", False, str(e)))
196
+
197
+ # ── 8. Metrics endpoint ─────────────────────────────────────────────────
198
+ print("\n8. Metrics Endpoint (Prometheus)")
199
+ try:
200
+ r = get(f"{base}/metrics")
201
+ results.append(check("GET /metrics returns 200", r.status_code == 200))
202
+ content = r.text
203
+ results.append(check("metrics contain step counter",
204
+ "gridmind_steps_total" in content))
205
+ results.append(check("metrics contain latency gauge",
206
+ "gridmind_step_latency_ms_avg" in content))
207
+ except Exception as e:
208
+ results.append(check("GET /metrics succeeds", False, str(e)))
209
+
210
+ # ── 9. Grader score variation ─────────────────────��─────────────────────
211
+ print("\n9. Grader Score Variation (non-trivial scores)")
212
+ scores_nonzero = []
213
+ scores_nonone = []
214
+ for seed in [10, 20, 30]:
215
+ try:
216
+ post(f"{base}/reset", {"task_id": 1, "seed": seed})
217
+ # Two different policies
218
+ for a in [0.1, 0.9]:
219
+ post(f"{base}/reset", {"task_id": 1, "seed": seed})
220
+ for _ in range(96):
221
+ post(f"{base}/step", {"hvac_power_level": a, "thermal_charge_rate": 0,
222
+ "batch_job_slot": 0, "load_shed_fraction": 0})
223
+ g = requests.get(f"{base}/grade", timeout=10).json()
224
+ sc = g.get("score", 0)
225
+ scores_nonzero.append(sc > 0.01)
226
+ scores_nonone.append(sc < 0.999)
227
+ except Exception:
228
+ pass
229
+ results.append(check("Scores are not always 0.0", any(scores_nonzero)))
230
+ results.append(check("Scores are not always 1.0", any(scores_nonone)))
231
+
232
+ # ── Summary ─────────────────────────────────────────────────────────────
233
+ passed = sum(results)
234
+ total = len(results)
235
+ pct = 100 * passed // total if total > 0 else 0
236
+
237
+ print(f"\n══════════════════════════════════════════")
238
+ print(f" Result: {passed}/{total} checks passed ({pct}%)")
239
+ if passed == total:
240
+ print(" 🎉 ALL CHECKS PASSED — Ready for submission!")
241
+ else:
242
+ print(f" ⚠ {total - passed} checks failed. Fix errors above.")
243
+ print("══════════════════════════════════════════\n")
244
+
245
+ return passed == total
246
+
247
+
248
+ def main():
249
+ parser = argparse.ArgumentParser()
250
+ parser.add_argument("--env-url", type=str, default=ENV_URL)
251
+ args = parser.parse_args()
252
+
253
+ ok = validate(args.env_url)
254
+ sys.exit(0 if ok else 1)
255
+
256
+
257
+ if __name__ == "__main__":
258
+ main()
tests/environment_test.go ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // tests/test_environment.go — unit tests for GridMind-RL environment
2
+ package tests
3
+
4
+ import (
5
+ "testing"
6
+ "math"
7
+
8
+ "gridmind-rl/env"
9
+ )
10
+
11
+ // TestResetProducesValidObservation checks that reset returns sane initial observations.
12
+ func TestResetProducesValidObservation(t *testing.T) {
13
+ e := env.NewEnvironment()
14
+ var seed int64 = 42
15
+ resp := e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
16
+
17
+ if len(resp.Observations) != 1 {
18
+ t.Fatalf("expected 1 observation, got %d", len(resp.Observations))
19
+ }
20
+ obs := resp.Observations[0]
21
+
22
+ if obs.IndoorTemperature < 10 || obs.IndoorTemperature > 40 {
23
+ t.Errorf("indoor_temperature out of range: %.2f", obs.IndoorTemperature)
24
+ }
25
+ if obs.ThermalStorageLevel < 0 || obs.ThermalStorageLevel > 1 {
26
+ t.Errorf("thermal_storage_level out of [0,1]: %.3f", obs.ThermalStorageLevel)
27
+ }
28
+ if obs.CurrentPrice <= 0 {
29
+ t.Errorf("current_price must be positive, got %.4f", obs.CurrentPrice)
30
+ }
31
+ if obs.HourOfDay < 0 || obs.HourOfDay > 23 {
32
+ t.Errorf("hour_of_day out of [0,23]: %d", obs.HourOfDay)
33
+ }
34
+ if obs.GridStressSignal < 0 || obs.GridStressSignal > 1 {
35
+ t.Errorf("grid_stress_signal out of [0,1]: %.3f", obs.GridStressSignal)
36
+ }
37
+ }
38
+
39
+ // TestStepAdvancesState verifies that step increments the step counter.
40
+ func TestStepAdvancesState(t *testing.T) {
41
+ e := env.NewEnvironment()
42
+ var seed int64 = 1
43
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
44
+
45
+ action := []env.ActionModel{{HVACPowerLevel: 0.5, ThermalChargeRate: 0.0, BatchJobSlot: 0}}
46
+ resps, done := e.Step(action)
47
+
48
+ if done {
49
+ t.Error("episode should not be done after first step")
50
+ }
51
+ if len(resps) != 1 {
52
+ t.Fatalf("expected 1 step response, got %d", len(resps))
53
+ }
54
+ state := e.GetState()
55
+ if state.Step != 1 {
56
+ t.Errorf("expected step=1 after one step, got %d", state.Step)
57
+ }
58
+ }
59
+
60
+ // TestEpisodeLengthIs96 verifies the episode terminates at step 96.
61
+ func TestEpisodeLengthIs96(t *testing.T) {
62
+ e := env.NewEnvironment()
63
+ var seed int64 = 99
64
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1, NumBuildings: 1})
65
+
66
+ action := []env.ActionModel{{HVACPowerLevel: 0.5}}
67
+ var lastDone bool
68
+ for i := 0; i < 96; i++ {
69
+ _, lastDone = e.Step(action)
70
+ }
71
+ if !lastDone {
72
+ t.Errorf("episode should be done after 96 steps")
73
+ }
74
+ }
75
+
76
+ // TestDeterministicWithSeed verifies that two runs with the same seed produce identical rewards.
77
+ func TestDeterministicWithSeed(t *testing.T) {
78
+ action := []env.ActionModel{{HVACPowerLevel: 0.4, ThermalChargeRate: 0.1, BatchJobSlot: 1}}
79
+ var seed int64 = 1337
80
+
81
+ run := func() float64 {
82
+ e := env.NewEnvironment()
83
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 2, NumBuildings: 1})
84
+ resps, _ := e.Step(action)
85
+ return resps[0].Reward
86
+ }
87
+
88
+ r1 := run()
89
+ r2 := run()
90
+ if math.Abs(r1-r2) > 1e-9 {
91
+ t.Errorf("non-deterministic rewards with same seed: %.6f vs %.6f", r1, r2)
92
+ }
93
+ }
94
+
95
+ // TestActionClamping verifies out-of-range actions are clamped.
96
+ func TestActionClamping(t *testing.T) {
97
+ e := env.NewEnvironment()
98
+ var seed int64 = 7
99
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1})
100
+
101
+ // Over-range action
102
+ action := []env.ActionModel{{HVACPowerLevel: 2.0, ThermalChargeRate: -5.0, LoadShedFraction: 0.9}}
103
+ resps, _ := e.Step(action)
104
+ if len(resps) == 0 {
105
+ t.Fatal("no responses returned")
106
+ }
107
+ // After step, state should still be valid
108
+ state := e.GetState()
109
+ if len(state.Buildings) == 0 {
110
+ t.Fatal("no buildings in state")
111
+ }
112
+ b := state.Buildings[0]
113
+ if b.ThermalStorageLevel < 0 || b.ThermalStorageLevel > 1 {
114
+ t.Errorf("thermal storage out of bounds: %.3f", b.ThermalStorageLevel)
115
+ }
116
+ }
117
+
118
+ // TestMultiBuildingFederation checks that 3-building reset + step works.
119
+ func TestMultiBuildingFederation(t *testing.T) {
120
+ e := env.NewEnvironment()
121
+ var seed int64 = 5
122
+ resp := e.Reset(env.ResetRequest{Seed: &seed, TaskID: 3, NumBuildings: 3})
123
+
124
+ if len(resp.Observations) != 3 {
125
+ t.Fatalf("expected 3 observations for 3 buildings, got %d", len(resp.Observations))
126
+ }
127
+
128
+ actions := []env.ActionModel{
129
+ {HVACPowerLevel: 0.3, BuildingID: 0},
130
+ {HVACPowerLevel: 0.5, BuildingID: 1},
131
+ {HVACPowerLevel: 0.7, BuildingID: 2},
132
+ }
133
+ resps, _ := e.Step(actions)
134
+ if len(resps) != 3 {
135
+ t.Fatalf("expected 3 step responses, got %d", len(resps))
136
+ }
137
+ }
138
+
139
+ // TestRewardComponentsAreFinite verifies no NaN/Inf in rewards.
140
+ func TestRewardComponentsAreFinite(t *testing.T) {
141
+ e := env.NewEnvironment()
142
+ var seed int64 = 42
143
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 3})
144
+
145
+ action := []env.ActionModel{{HVACPowerLevel: 0.5, ThermalChargeRate: 0.2, BatchJobSlot: 2, LoadShedFraction: 0.3}}
146
+ resps, _ := e.Step(action)
147
+
148
+ rc := resps[0].Info.RewardComponents
149
+ vals := []float64{rc.CostSavings, rc.TempConstraint, rc.GridResponse,
150
+ rc.DeadlinePenalty, rc.EfficiencyBonus, rc.StabilityPenalty, rc.CarbonReward, rc.Total}
151
+ for i, v := range vals {
152
+ if math.IsNaN(v) || math.IsInf(v, 0) {
153
+ t.Errorf("reward component %d is not finite: %v", i, v)
154
+ }
155
+ }
156
+ }
157
+
158
+ // TestGraderTask1ScoreRange verifies Task 1 score is always in [0, 1].
159
+ func TestGraderTask1ScoreRange(t *testing.T) {
160
+ e := env.NewEnvironment()
161
+ var seed int64 = 101
162
+ e.Reset(env.ResetRequest{Seed: &seed, TaskID: 1})
163
+
164
+ action := []env.ActionModel{{HVACPowerLevel: 0.3}}
165
+ for i := 0; i < 96; i++ {
166
+ e.Step(action)
167
+ }
168
+
169
+ state := e.GetState()
170
+ replay := e.GetReplay()
171
+
172
+ buildings := make([]*env.BuildingState, len(state.Buildings))
173
+ for i, pub := range state.Buildings {
174
+ buildings[i] = &env.BuildingState{
175
+ CumulativeCost: pub.CumulativeCost,
176
+ BaselineCost: pub.BaselineCost,
177
+ }
178
+ }
179
+
180
+ grade := env.GradeEpisode(env.GradeEpisodeInput{
181
+ TaskID: 1,
182
+ Buildings: buildings,
183
+ Replay: replay,
184
+ TMin: env.TMinDefault,
185
+ TMax: env.TMaxDefault,
186
+ })
187
+
188
+ if grade.Score < 0 || grade.Score > 1 {
189
+ t.Errorf("Task 1 score out of [0,1]: %.4f", grade.Score)
190
+ }
191
+ }
tests/test_graders.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ GridMind-RL Grader Validation Tests (pytest)
3
+ Run with: pytest tests/test_graders.py -v
4
+ """
5
+
6
+ import json
7
+ import time
8
+ import pytest
9
+ import requests
10
+
11
+ ENV_URL = "http://localhost:7860"
12
+ BASE = ENV_URL
13
+
14
+
15
+ def wait_for_server(url: str, timeout: int = 15):
16
+ for _ in range(timeout):
17
+ try:
18
+ r = requests.get(f"{url}/health", timeout=2)
19
+ if r.status_code == 200:
20
+ return True
21
+ except Exception:
22
+ pass
23
+ time.sleep(1)
24
+ return False
25
+
26
+
27
+ @pytest.fixture(scope="session", autouse=True)
28
+ def server_running():
29
+ if not wait_for_server(ENV_URL):
30
+ pytest.skip("GridMind-RL server not running at " + ENV_URL)
31
+
32
+
33
+ def reset(task_id=1, seed=42):
34
+ r = requests.post(f"{BASE}/reset", json={"task_id": task_id, "seed": seed, "num_buildings": 1})
35
+ r.raise_for_status()
36
+ return r.json()
37
+
38
+
39
+ def step(action: dict) -> dict:
40
+ r = requests.post(f"{BASE}/step", json=action)
41
+ r.raise_for_status()
42
+ return r.json()
43
+
44
+
45
+ def grade() -> dict:
46
+ r = requests.get(f"{BASE}/grade")
47
+ r.raise_for_status()
48
+ return r.json()
49
+
50
+
51
+ def run_full_episode(task_id: int, seed: int, hvac: float = 0.5) -> dict:
52
+ reset(task_id=task_id, seed=seed)
53
+ action = {"hvac_power_level": hvac, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0}
54
+ for _ in range(96):
55
+ resp = step(action)
56
+ if resp.get("done"):
57
+ break
58
+ return grade()
59
+
60
+
61
+ # ── Task 1 ──────────────────────────────────────────────────────────────────
62
+
63
+ class TestTask1:
64
+ def test_score_in_range(self):
65
+ g = run_full_episode(task_id=1, seed=1)
66
+ assert 0.0 <= g["score"] <= 1.0, f"Score {g['score']} out of [0,1]"
67
+
68
+ def test_score_not_always_zero(self):
69
+ g = run_full_episode(task_id=1, seed=2, hvac=0.2)
70
+ assert g["score"] > 0.01, "Low HVAC policy should score > 0"
71
+
72
+ def test_score_not_always_one(self):
73
+ g = run_full_episode(task_id=1, seed=3, hvac=1.0)
74
+ assert g["score"] < 0.999, "Always-on policy should not score 1.0"
75
+
76
+ def test_deterministic(self):
77
+ g1 = run_full_episode(task_id=1, seed=42)
78
+ g2 = run_full_episode(task_id=1, seed=42)
79
+ assert abs(g1["score"] - g2["score"]) < 1e-6, "Grader not deterministic with same seed"
80
+
81
+ def test_sub_scores_present(self):
82
+ g = run_full_episode(task_id=1, seed=5)
83
+ assert "cost" in g["sub_scores"], "Task 1 grade missing 'cost' sub-score"
84
+
85
+ def test_exploit_shedding_penalized(self):
86
+ """Always shedding 50% should be detected and penalized."""
87
+ reset(task_id=1, seed=10)
88
+ action = {"hvac_power_level": 0.5, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0.5}
89
+ for _ in range(96):
90
+ step(action)
91
+ g = grade()
92
+ # Score should be reduced OR exploit flagged
93
+ assert g["exploit_detected"] or g["score"] < 0.9
94
+
95
+
96
+ # ── Task 2 ──────────────────────────────────────────────────────────────────
97
+
98
+ class TestTask2:
99
+ def test_score_in_range(self):
100
+ g = run_full_episode(task_id=2, seed=20)
101
+ assert 0.0 <= g["score"] <= 1.0
102
+
103
+ def test_has_temp_sub_score(self):
104
+ g = run_full_episode(task_id=2, seed=21)
105
+ assert "temperature" in g["sub_scores"]
106
+
107
+ def test_temp_score_range(self):
108
+ g = run_full_episode(task_id=2, seed=22)
109
+ ts = g["sub_scores"].get("temperature", -1)
110
+ assert 0.0 <= ts <= 1.0, f"Temperature sub-score {ts} out of [0,1]"
111
+
112
+ def test_weights_sum_correct(self):
113
+ """Task 2 score = 0.6*cost + 0.4*temp."""
114
+ g = run_full_episode(task_id=2, seed=23)
115
+ expected = g["sub_scores"]["cost"] * 0.6 + g["sub_scores"]["temperature"] * 0.4
116
+ assert abs(g["score"] - expected) < 0.01 or g["exploit_detected"]
117
+
118
+ def test_score_varies_with_policy(self):
119
+ g_low = run_full_episode(task_id=2, seed=24, hvac=0.1)
120
+ g_high = run_full_episode(task_id=2, seed=24, hvac=0.9)
121
+ # Scores should differ (policy matters)
122
+ assert abs(g_low["score"] - g_high["score"]) > 0.001
123
+
124
+
125
+ # ── Task 3 ──────────────────────────────────────────────────────────────────
126
+
127
+ class TestTask3:
128
+ def test_score_in_range(self):
129
+ g = run_full_episode(task_id=3, seed=30)
130
+ assert 0.0 <= g["score"] <= 1.0
131
+
132
+ def test_has_all_sub_scores(self):
133
+ g = run_full_episode(task_id=3, seed=31)
134
+ for key in ["cost", "temperature", "grid_response", "batch_deadline"]:
135
+ assert key in g["sub_scores"], f"Missing sub-score: {key}"
136
+
137
+ def test_all_sub_scores_in_range(self):
138
+ g = run_full_episode(task_id=3, seed=32)
139
+ for key, val in g["sub_scores"].items():
140
+ assert 0.0 <= val <= 1.0, f"Sub-score '{key}' = {val} out of [0,1]"
141
+
142
+ def test_weights_sum_correct(self):
143
+ g = run_full_episode(task_id=3, seed=33)
144
+ ss = g["sub_scores"]
145
+ expected = ss["cost"]*0.35 + ss["temperature"]*0.25 + ss["grid_response"]*0.25 + ss["batch_deadline"]*0.15
146
+ assert abs(g["score"] - expected) < 0.01 or g["exploit_detected"]
147
+
148
+ def test_grid_response_sub_score(self):
149
+ g = run_full_episode(task_id=3, seed=34)
150
+ gs = g["sub_scores"].get("grid_response", -1)
151
+ assert 0.0 <= gs <= 1.0, f"grid_response={gs} out of [0,1]"
152
+
153
+ def test_batch_deadline_sub_score(self):
154
+ g = run_full_episode(task_id=3, seed=35)
155
+ bd = g["sub_scores"].get("batch_deadline", -1)
156
+ assert 0.0 <= bd <= 1.0
157
+
158
+
159
+ # ── Multi-building grading ────────────────────────────────────────────────────
160
+
161
+ class TestMultiBuilding:
162
+ def test_2_building_grade(self):
163
+ requests.post(f"{BASE}/reset", json={"task_id": 1, "seed": 50, "num_buildings": 2}).raise_for_status()
164
+ action = [
165
+ {"hvac_power_level": 0.4, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0, "building_id": 0},
166
+ {"hvac_power_level": 0.6, "thermal_charge_rate": 0, "batch_job_slot": 0, "load_shed_fraction": 0, "building_id": 1},
167
+ ]
168
+ for _ in range(96):
169
+ r = requests.post(f"{BASE}/step", json=action)
170
+ if r.json()[0].get("done"):
171
+ break
172
+ g = grade()
173
+ assert 0.0 <= g["score"] <= 1.0