Spaces:

saketh1201
/

quartermaster-env

Running

App Files Files Community

saketh1201 commited on Apr 24

Commit

a59ad4e

verified ·

1 Parent(s): bbc688c

Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

Dockerfile +30 -0
README.md +242 -10
__init__.py +0 -0
client.py +68 -0
inference.py +330 -0
models.py +39 -0
openenv.yaml +6 -0
pyproject.toml +24 -0
scripts/validate-submission.sh +172 -0
server/__init__.py +0 -0
server/app.py +92 -0
server/constants.py +186 -0
server/grader.py +124 -0
server/inventory_env.py +264 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,30 @@

+FROM ghcr.io/meta-pytorch/openenv-base:latest AS builder
+RUN apt-get update && apt-get install -y git curl && \
+    curl -LsSf https://astral.sh/uv/install.sh | sh
+ENV PATH="/root/.local/bin:$PATH"
+WORKDIR /app
+COPY pyproject.toml uv.lock* ./
+RUN uv sync --no-install-project --frozen || uv sync --no-install-project
+COPY . .
+RUN uv sync
+FROM ghcr.io/meta-pytorch/openenv-base:latest
+WORKDIR /app
+COPY --from=builder /app/.venv /app/.venv
+COPY --from=builder /app /app
+ENV PATH="/app/.venv/bin:$PATH"
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONPATH="/app:$PYTHONPATH"
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=3s \
+    CMD curl -f http://localhost:8000/health || exit 1
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,242 @@
----
-title: Quartermaster Env
-emoji: 🐢
-colorFrom: gray
-colorTo: red
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Inventory Optimization Environment
+emoji: 📦
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_port: 8000
+tags:
+  - openenv
+base_path: /web
+---
+# Retail Inventory Optimization Environment
+An OpenEnv reinforcement learning environment that simulates day-by-day retail inventory management across 5 product categories. An AI agent must balance purchasing, pricing, shipping, and liquidation decisions to maximize profit over a 30-day episode.
+## Why Inventory Management?
+Retail inventory optimization is a real-world task performed daily by store managers, warehouse operators, and supply chain planners. The agent faces the same challenges as a human manager: uncertain demand, perishable goods, shipping delays, seasonal events, and limited cash flow. Poor decisions lead to stockouts (lost sales), waste (expired goods), or cash tied up in unsold inventory.
+## Environment Description
+You manage a retail store selling 5 products with different characteristics:
+| Product | Sell Price | Cost Price | Profit Margin | Shelf Life |
+|---------|-----------|------------|---------------|------------|
+| Electronics | $150 | $100 | $50 | No expiry |
+| Clothing | $40 | $25 | $15 | No expiry |
+| Groceries | $10 | $5 | $5 | 5 days |
+| Furniture | $200 | $130 | $70 | No expiry |
+| Toys | $25 | $12 | $13 | No expiry |
+Each day the agent receives the current store state (cash, inventory with batch expiry, pending deliveries, upcoming events) and must decide:
+- **What to buy** and how much of each product
+- **How to ship** — slow (cheap but unreliable), medium, or fast (expensive but guaranteed)
+- **What to liquidate** — dispose of expiring or excess stock
+- **How to price** — set per-product price multipliers that affect demand via elasticity
+Customer demand is generated each day based on base ranges, weekend boosts (1.2x on days 5-6), and seasonal event multipliers (up to 3x during Black Friday, Christmas, etc.). The agent cannot see future demand — only yesterday's demand as feedback.
+The episode runs for 30 days. The goal is to maximize total profit.
+## Environment Design Highlights
+### Batch-Tracked Inventory with FIFO
+Inventory is tracked per batch with individual expiry dates. Groceries expire after 5 days. Selling and liquidation follow FIFO (First In, First Out) — oldest batches are consumed first, mimicking real warehouse operations.
+```json
+{"groceries": [[20, 3], [15, 5], [10, 1]]}
+```
+Three batches: 20 units (3 days left), 15 units (5 days left), 10 units (1 day left — liquidate or lose them).
+### Dynamic Pricing with Price Elasticity
+The agent can set per-product price multipliers (0.5x to 1.5x) each day. Demand responds to pricing via realistic elasticity values — groceries are inelastic (people buy regardless), while clothing and toys are highly elastic (price-sensitive customers).
+| Product | Elasticity | Effect of 1.3x price |
+|---------|-----------|----------------------|
+| Electronics | 1.2 | Demand drops ~24% |
+| Clothing | 1.5 | Demand drops ~38% |
+| Groceries | 0.4 | Demand drops only ~11% |
+| Furniture | 0.8 | Demand drops ~22% |
+| Toys | 1.3 | Demand drops ~33% |
+### Delivery Jitter
+Shipping isn't perfectly reliable. Slow delivery has +/-2 day variance, medium has +/-1 day. Only fast delivery (at 5x the cost) is guaranteed next-day. The agent must account for uncertainty when planning restocks before events.
+### Seasonal Events with Demand Spikes
+Five events are spread across the 30-day episode. Each event triggers a 2-day demand multiplier — Black Friday triples electronics demand, Christmas triples toys, etc. A "new competitor" event actually reduces demand. The agent sees countdowns and must stock up in advance.
+### Decomposed Per-Step Reward
+The reward function provides granular feedback every step, not just end-of-episode:
+| Signal | Formula | Purpose |
+|--------|---------|---------|
+| Successful sales | `+sold * sell_price * 0.001` | Reward revenue proportional to product value |
+| Missed sales | `-missed * sell_price * 0.001` | Penalize stockouts, weighted by product value |
+| Expired groceries | `-0.05 * expired_count` | Penalize waste from overbuying perishables |
+| Failed purchases | `-0.5 per rejected order` | Penalize ordering beyond cash budget |
+| Liquidation loss | `-disposed_value * 0.001` | Penalize disposal proportional to cost |
+### Conversation History for LLM Agents
+The inference script maintains a rolling 7-day conversation history. The LLM sees its past observations and decisions, enabling it to spot demand trends, learn from mistakes, and adjust strategy across the episode.
+## Action Space
+```python
+class InventoryAction(Action):
+    buy_quantities: Dict[str, int] = {}
+    delivery_method: Literal["slow", "medium", "fast"] = "slow"
+    liquidate: Dict[str, int] = {}
+    price_multipliers: Dict[str, float] = {}
+```
+| Field | Description |
+|-------|-------------|
+| `buy_quantities` | Products and amounts to order. Empty `{}` to skip buying. |
+| `delivery_method` | `"slow"` ($2/unit, 3-7 days), `"medium"` ($5/unit, 2-4 days), `"fast"` ($10/unit, 1 day guaranteed) |
+| `liquidate` | Products and amounts to dispose of (no revenue). Use for expiring groceries or freeing warehouse space. |
+| `price_multipliers` | Per-product selling price multiplier (0.5-1.5). Affects demand via elasticity. Default 1.0 if omitted. |
+## Observation Space
+```python
+class InventoryObservation(Observation):
+    current_day: int
+    total_cash: float
+    day_profit: float
+    total_profit: float
+    demand_today: Dict[str, int]           # yesterday's demand (feedback)
+    updated_inventory: Dict[str, List]     # [[qty, days_left], ...] per batch
+    remaining_capacity: Dict[str, int]     # warehouse space left per product
+    updated_events: Dict[str, int]         # event countdowns (negative = active/ended)
+    updated_deliveries: List[Dict]         # in-transit shipments
+```
+## Tasks (Easy / Medium / Hard)
+### Easy — "Steady State"
+- Low starting stock, low steady demand, no events
+- Starting cash: $1,000 | Full warehouse capacity
+- Agent needs to restock regularly but demand is predictable
+- No events, no demand spikes — pure supply chain management
+### Medium — "Seasonal Rush"
+- Default stock/cash, all 5 events spread across 30 days
+- Events: Black Friday (day 6), Christmas (day 12), Back to School (day 18), Summer Clearance (day 24), New Competitor (day 28)
+- Agent must anticipate demand spikes and restock before events hit
+### Hard — "Chaos Mode"
+- Half starting cash ($500), low stock, events packed close together (days 4, 8, 12, 16, 20)
+- Higher base demand, smaller warehouse capacity
+- Agent must balance tight budget, overlapping event spikes, perishable goods, and limited storage
+## Grading (0.0 - 1.0)
+Each task is scored by comparing agent profit against two deterministic baselines:
+- **Floor**: Passive agent that never buys (sells initial stock until depleted)
+- **Ceiling**: Theoretical max profit assuming perfect demand knowledge and cheapest shipping
+```
+score = clamp((agent_profit - floor) / (ceiling - floor), 0.0, 1.0)
+```
+Both baselines are deterministic (seeded RNG) and computed fresh each run to ensure reproducibility.
+## Setup
+```bash
+# Install dependencies
+pip install openenv-core[core] fastapi uvicorn pydantic openai numpy python-dotenv
+# Run grader baselines
+python -c "from server.grader import compute_baselines; [print(f'{t}: floor={f:.2f}, ceiling={c:.2f}') for t in ['easy','medium','hard'] for f,c in [compute_baselines(t)]]"
+# Start server locally
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Test endpoints
+curl http://localhost:8000/health
+curl -X POST http://localhost:8000/reset
+```
+## Running Inference
+```bash
+# Using HuggingFace Router
+export API_BASE_URL="https://router.huggingface.co/v1"
+export MODEL_NAME="Qwen/Qwen3-32B"
+export HF_TOKEN="your-token"
+python inference.py
+# Using OpenAI
+export API_BASE_URL="https://api.openai.com/v1"
+export MODEL_NAME="gpt-4o"
+export API_KEY="sk-your-key"
+python inference.py
+```
+## Docker
+```bash
+docker build -t inventory-env .
+docker run -p 8000:8000 inventory-env
+```
+## API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check — returns 200 if server is running |
+| `/reset` | POST | Reset environment, returns initial observation |
+| `/step` | POST | Submit an action (JSON body), returns next observation with reward |
+| `/state` | GET | Get current episode state (day, cash, inventory) |
+| `/tasks` | GET | List all 3 tasks with full config (stock, capacity, demand ranges, events) |
+| `/grader` | POST | Score an episode given task name and agent profit |
+| `/baseline` | GET | Run LLM inference on a task and return the score |
+### Example Queries
+```bash
+# List all tasks with full schemas
+curl http://localhost:8000/tasks
+# Grade a specific profit
+curl -X POST "http://localhost:8000/grader?task_name=easy&agent_profit=5000"
+# → {"task_name":"easy","agent_profit":5000.0,"floor":2200.0,"ceiling":10011.0,"score":0.358}
+# Run baseline inference (requires API keys in container env)
+curl "http://localhost:8000/baseline"
+curl "http://localhost:8000/baseline?task_name=hard"
+# → {"task_name":"easy","score":0.822}
+```
+## Step Execution Order
+Each `step()` call processes in this order:
+1. Tick event countdowns (into negatives to track active duration)
+2. Remove expired groceries (shelf life = 0)
+3. Receive arriving deliveries (add to inventory with fresh shelf life)
+4. Process purchase orders (deduct cash, schedule deliveries with jitter)
+5. Generate demand (base + weekend boost + event multipliers + price elasticity)
+6. Sell products FIFO (oldest batches first, track missed sales)
+7. Liquidate requested stock FIFO (no revenue)
+8. Compute profit, reward, update state, return observation
+## Project Structure
+```
+├── models.py              # InventoryAction, InventoryObservation, InventoryState (Pydantic)
+├── client.py              # EnvClient for remote WebSocket connections
+├── inference.py           # LLM inference script with conversation history (runs all 3 tasks)
+├── openenv.yaml           # OpenEnv spec manifest
+├── pyproject.toml         # Python dependencies
+├── Dockerfile             # Multi-stage container build from openenv-base
+├── server/
+│   ├── app.py             # FastAPI server (create_app + uvicorn entry point)
+│   ├── inventory_env.py   # Environment (reset, step, state, demand generation)
+│   ├── constants.py       # All configs: prices, stock, events, tasks, elasticity
+│   └── grader.py          # Floor/ceiling baselines and 0.0-1.0 scoring
+└── scripts/
+    └── validate-submission.sh  # Pre-submission validator
+```

__init__.py ADDED Viewed

File without changes

client.py ADDED Viewed

	@@ -0,0 +1,68 @@

+from __future__ import annotations
+from typing import Any, Dict
+from openenv.core.client_types import StepResult
+from openenv.core.env_client import EnvClient
+from models import InventoryAction, InventoryObservation, InventoryState
+class InventoryEnv(EnvClient[InventoryAction, InventoryObservation, InventoryState]):
+    def _step_payload(self, action : InventoryAction) -> Dict[str, Any]:
+        payload: Dict[str, Any] = {}
+        if action.buy_quantities is not None:
+            payload["buy_quantities"] = action.buy_quantities
+        if action.delivery_method is not None:
+            payload["delivery_method"] = action.delivery_method
+        if action.liquidate is not None:
+            payload["liquidate"] = action.liquidate
+        if action.price_multipliers is not None:
+            payload["price_multipliers"] = action.price_multipliers
+        return payload
+    def _parse_result(self, payload: Dict) -> StepResult[InventoryObservation]:
+        obs_data = payload.get("observation", {})
+        observation = InventoryObservation(
+            current_day = obs_data.get("current_day", 0),
+            total_cash = obs_data.get("total_cash", 0),
+            day_profit = obs_data.get("day_profit", 0),
+            total_profit = obs_data.get("total_profit", 0),
+            demand_today = obs_data.get("demand_today", {}),
+            updated_inventory = obs_data.get("updated_inventory", {}),
+            remaining_capacity = obs_data.get("remaining_capacity", {}),
+            updated_events = obs_data.get("updated_events", {}),
+            updated_deliveries = obs_data.get("updated_deliveries", []),
+            done = obs_data.get("done", False),
+            reward = obs_data.get("reward", 0.0),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation = observation,
+            reward = observation.reward,
+            done = observation.done,
+        )
+    def _parse_state(self, payload: Dict[str, Any]) -> InventoryState:
+        return InventoryState(
+            episode_id = payload.get("episode_id", ""),
+            current_day = payload.get("current_day", 0),
+            cash = payload.get("cash", 0.0),
+            inventory = payload.get("inventory", {}),
+        )

inference.py ADDED Viewed

	@@ -0,0 +1,330 @@

+"""
+Inference Script - Inventory Optimization Environment
+=====================================================
+Required env vars:
+    API_BASE_URL   The API endpoint for the LLM.
+    MODEL_NAME     The model identifier to use for inference.
+    HF_TOKEN       Hugging Face token (preferred for HF Router).
+Supported key env vars (first non-empty wins): HF_TOKEN, API_KEY, OPENAI_API_KEY.
+For non-OpenAI endpoints, a dummy key is used when no key is provided because
+the OpenAI Python SDK requires a non-empty api_key argument.
+"""
+import os
+import json
+import textwrap
+from dotenv import load_dotenv
+load_dotenv()
+from openai import OpenAI
+from server.inventory_env import InventoryEnvironment
+from server.constants import EXTRA_INVENTORY_COST, EVENT_DURATION, TASKS, COST_PRICES, SHIPPING_COST, BASE_PRICES
+from models import InventoryAction
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+API_KEY = os.getenv("API_KEY") or os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY")
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen3-32B"
+TASK_NAME = os.getenv("TASK_NAME") or "easy"
+MAX_DAYS = 30
+SYSTEM_PROMPT = textwrap.dedent("""
+    You are an inventory management AI agent. Each day you receive the current state
+    of a retail store with 5 products: electronics, clothing, groceries, furniture, toys.
+    You will be shown your decision history from recent days so you can learn from
+    past outcomes. Use this history to spot demand trends, identify what worked vs.
+    what didn't, and adjust your strategy accordingly.
+    Groceries are perishable (5-day shelf life). Other products don't expire.
+    Product selling prices: electronics=$150, clothing=$40, groceries=$10, furniture=$200, toys=$25
+    Product cost prices: electronics=$100, clothing=$25, groceries=$5, furniture=$130, toys=$12
+    Profit margins: electronics=$50, clothing=$15, groceries=$5, furniture=$70, toys=$13
+    Shipping costs per unit: slow=$2 (3-7 days), medium=$5 (2-4 days), fast=$10 (1 day, always reliable)
+    Warehouse capacity: electronics=100, clothing=200, groceries=500, furniture=50, toys=300
+    Events (like black_friday, christmas) boost demand when their countdown hits 0 and last for 2 days.
+    Weekends (day%7 == 5 or 6) have 1.2x demand.
+    CRITICAL STRATEGY:
+    - Review your history: if reward was negative, identify why and change approach.
+    - Track demand trends across days.
+    - You MUST restock products when inventory is low. Missed sales = lost revenue = negative reward.
+    - Do NOT overbuy when demand is low — unsold stock ties up cash and perishables expire.
+    - Stock up BEFORE events hit (check event countdowns — order 3-5 days ahead).
+    - When no events are approaching, slow shipping is often sufficient and saves significant cost.
+    - Near end of episode (last 2 days), stop buying — focus on selling remaining stock.
+    DYNAMIC PRICING:
+    You can set a price multiplier (0.5 to 1.5) per product each day. Default is 1.0.
+    - Lower price (e.g. 0.7) = more demand but less revenue per unit. Good for clearing excess stock.
+    - Higher price (e.g. 1.3) = less demand but more revenue per unit. Good when stock is low.
+    - Price elasticity varies across different products.
+    - Elasticity values: electronics=1.2, clothing=1.5, groceries=0.4, furniture=0.8, toys=1.3
+    Each day you must respond with a JSON action:
+    {
+        "buy_quantities": {"product_name": quantity, ...},
+        "delivery_method": "slow" | "medium" | "fast",
+        "liquidate": {"product_name": quantity, ...},
+        "price_multipliers": {"product_name": multiplier, ...}
+    }
+    - buy_quantities: products and amounts to order.
+    - delivery_method: shipping speed for this order
+    - liquidate: products and amounts to dispose of (no revenue, empty {} to skip)
+      Use liquidate to free up warehouse space before a restock.
+    - price_multipliers: set selling price multiplier per product (0.5-1.5, default 1.0 if omitted)
+    LEARNING FROM HISTORY:
+    - Compare your past buy quantities to the demand that followed — were you over or under?
+    - If you see repeated stockouts for a product, increase orders for it.
+    - If groceries expired, you overbought — reduce grocery orders or use faster shipping.
+    - A negative reward means your last action was bad — adjust immediately.
+    Before responding with JSON, briefly reason (2-3 lines max):
+    1. What did I learn from recent history? What went wrong/right?
+    2. What products need restocking vs. are overstocked?
+    3. Are any events approaching?
+    Then output ONLY the final JSON action on the last line.
+""").strip()
+def format_observation(obs):
+    """Convert observation into a readable prompt for the LLM."""
+    # format inventory with batch detail, remaining capacity, and extra cost
+    inv_lines = []
+    for product, batches in obs.updated_inventory.items():
+        total = sum(b[0] for b in batches)
+        remaining = obs.remaining_capacity.get(product, 0)
+        extra_cost = EXTRA_INVENTORY_COST.get(product, 0)
+        batch_detail = ", ".join(
+            f"{b[0]} units" + (f" ({b[1]}d left)" if b[1] is not None else "")
+            for b in batches
+        )
+        inv_lines.append(f"  {product}: {total} total [{batch_detail}] | space left: {remaining} (extra space: ${extra_cost}/unit)")
+    inv_text = "\n".join(inv_lines)
+    # format events
+    event_lines = []
+    for event, days in obs.updated_events.items():
+        if days > 0:
+            event_lines.append(f"  {event}: in {days} days")
+        elif -EVENT_DURATION < days <= 0:
+            event_lines.append(f"  {event}: ACTIVE NOW")
+        else:
+            event_lines.append(f"  {event}: ended")
+    events_text = "\n".join(event_lines) if event_lines else "  None"
+    # format deliveries
+    delivery_lines = []
+    for delivery in obs.updated_deliveries:
+        for product, shipment in delivery.items():
+            qty, arrival_day = shipment
+            days_away = arrival_day - obs.current_day
+            delivery_lines.append(f"  {product}: {qty} units arriving in {days_away} days")
+    deliveries_text = "\n".join(delivery_lines) if delivery_lines else "  None"
+    # format demand (yesterday's demand — feedback, not prediction)
+    demand_lines = []
+    for product, units in obs.demand_today.items():
+        demand_lines.append(f"  {product}: {units} units")
+    demand_text = "\n".join(demand_lines) if demand_lines else "  No demand data yet"
+    prompt = f"""Day: {obs.current_day}/{MAX_DAYS}
+Cash: ${obs.total_cash:.2f}
+Day Profit: ${obs.day_profit:.2f}
+Total Profit: ${obs.total_profit:.2f}
+Last Step Reward: {obs.reward:.3f}
+Inventory:
+{inv_text}
+Yesterday's Demand:
+{demand_text}
+Upcoming Events:
+{events_text}
+Pending Deliveries:
+{deliveries_text}
+Respond with your action as JSON."""
+    return prompt
+def parse_action(response_text):
+    """Parse LLM response into InventoryAction. Extracts JSON even if surrounded by text."""
+    try:
+        text = response_text.strip()
+        # strip markdown code fences
+        if "```" in text:
+            parts = text.split("```")
+            for part in parts:
+                part = part.strip()
+                if part.startswith("json"):
+                    part = part[4:].strip()
+                if part.startswith("{"):
+                    text = part
+                    break
+        # find the first { and last } to extract JSON
+        start = text.find("{")
+        end = text.rfind("}")
+        if start != -1 and end != -1 and end > start:
+            text = text[start:end + 1]
+        data = json.loads(text)
+        # only keep valid fields
+        clean = {}
+        if "buy_quantities" in data:
+            clean["buy_quantities"] = data["buy_quantities"]
+        if "delivery_method" in data:
+            clean["delivery_method"] = data["delivery_method"]
+        if "liquidate" in data:
+            clean["liquidate"] = data["liquidate"]
+        if "price_multipliers" in data:
+            clean["price_multipliers"] = data["price_multipliers"]
+        return InventoryAction(**clean)
+    except Exception as e:
+        print(f"  [DEBUG] Parse FAILED: {e}")
+        print(f"  [DEBUG] Raw LLM response: {response_text[:500]}")
+        return InventoryAction(
+            buy_quantities={},
+            delivery_method="slow",
+            liquidate={},
+            price_multipliers={},
+        )
+HISTORY_WINDOW = 7  # rolling window of past days to include in context
+def run_task(client, task_name):
+    """Run a single task and return total profit."""
+    env = InventoryEnvironment(task_name)
+    obs = env.reset()
+    rewards = []
+    steps_taken = 0
+    success = False
+    print(f"[START] task={task_name} env=inventory_env model={MODEL_NAME}", flush=True)
+    # Rolling history of (user_observation, assistant_response) pairs
+    history = []
+    try:
+        for day in range(1, env.max_days + 1):
+            if obs.done:
+                break
+            user_prompt = format_observation(obs)
+            # Build messages: system + history context + current observation
+            messages = [{"role": "system", "content": SYSTEM_PROMPT}]
+            recent = history[-HISTORY_WINDOW:]
+            if recent:
+                messages.append({
+                    "role": "user",
+                    "content": f"Here is your decision history from the last {len(recent)} day(s). "
+                               "Use this to identify demand trends, adjust restocking, and avoid repeating mistakes.",
+                })
+                messages.append({
+                    "role": "assistant",
+                    "content": "Understood. I'll review my past decisions and their outcomes to make better choices today.",
+                })
+                for past_user, past_assistant in recent:
+                    messages.append({"role": "user", "content": past_user})
+                    messages.append({"role": "assistant", "content": past_assistant})
+            messages.append({"role": "user", "content": user_prompt})
+            error = None
+            try:
+                completion = client.chat.completions.create(
+                    model=MODEL_NAME,
+                    messages=messages,
+                    temperature=0.0,
+                    max_completion_tokens=500,
+                    stream=False,
+                )
+                response_text = completion.choices[0].message.content or ""
+            except Exception as exc:
+                error = str(exc)
+                response_text = "{}"
+            # Save this turn to rolling history
+            history.append((user_prompt, response_text))
+            action = parse_action(response_text)
+            action_str = json.dumps({"buy": action.buy_quantities, "deliver": action.delivery_method, "liquidate": action.liquidate, "prices": action.price_multipliers})
+            obs = env.step(action)
+            reward = obs.reward
+            done = obs.done
+            rewards.append(reward)
+            steps_taken = day
+            print(f"[STEP] step={day} action={action_str} reward={reward:.2f} done={str(done).lower()} error={error if error else 'null'}", flush=True)
+            if done:
+                break
+        # compute score
+        from server.grader import grade
+        score = grade(task_name, obs.total_profit)
+        success = score >= 0.1
+    finally:
+        rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+        print(f"[END] success={str(success).lower()} steps={steps_taken} score={score:.3f} rewards={rewards_str}", flush=True)
+    return obs.total_profit
+def main():
+    from server.grader import grade, compute_baselines
+    if not MODEL_NAME:
+        raise RuntimeError("MODEL_NAME is not set. Please export MODEL_NAME before running inference.")
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    tasks = ["easy", "medium", "hard"]
+    # print baselines
+    print(f"\n{'=' * 50}")
+    print("BASELINES")
+    print(f"{'=' * 50}")
+    for task_name in tasks:
+        floor, ceiling = compute_baselines(task_name)
+        print(f"  {task_name}: floor=${floor:.2f} (passive) | ceiling=${ceiling:.2f} (heuristic)")
+    results = {}
+    for task_name in tasks:
+        profit = run_task(client, task_name)
+        results[task_name] = profit
+    print(f"\n{'=' * 50}")
+    print("FINAL SCORES")
+    print(f"{'=' * 50}")
+    for task_name in tasks:
+        floor, ceiling = compute_baselines(task_name)
+        score = grade(task_name, results[task_name])
+        print(f"  {task_name}: {score:.3f} (profit: ${results[task_name]:.2f} | floor: ${floor:.2f} | ceiling: ${ceiling:.2f})")
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,39 @@

+from __future__ import annotations
+import json
+from openenv.core.env_server import Action, Observation, State
+from typing import Literal, Dict, List, Optional
+from pydantic import field_validator
+class InventoryAction(Action):
+    buy_quantities : Dict[str, int] = {}
+    delivery_method : Literal["slow", "medium", "fast"] = "slow"
+    liquidate : Dict[str, int] = {}
+    price_multipliers : Dict[str, float] = {}  # product -> 0.5 to 1.5 (default 1.0)
+    @field_validator("buy_quantities", "liquidate", "price_multipliers", mode="before")
+    @classmethod
+    def parse_dict_strings(cls, v):
+        if isinstance(v, str):
+            return json.loads(v)
+        return v
+class InventoryObservation(Observation):
+    current_day : int
+    total_cash : float
+    day_profit : float
+    total_profit : float
+    demand_today : Dict[str, int]  # product -> units demanded today
+    updated_inventory : Dict[str, List[List[Optional[int]]]]  # product -> [[qty, days_left], ...] per batch
+    remaining_capacity : Dict[str, int]  # product -> remaining warehouse space
+    updated_events : Dict[str, int]
+    updated_deliveries : List[Dict[str, List[int]]] # product name, (quantity of product, days to arrival)
+class InventoryState(State):
+    episode_id : str
+    current_day : int
+    cash : float
+    inventory : Dict[str, int]

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: inventory_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,24 @@

+[project]
+name = "inventory-env"
+version = "0.1.0"
+description = "Retail Inventory Optimization RL Environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core[core]>=0.2.0",
+    "fastapi>=0.115.0",
+    "uvicorn>=0.24.0",
+    "pydantic>=2.0.0",
+    "numpy>=1.24.0",
+    "openai>=1.0.0",
+    "python-dotenv>=1.0.0",
+]
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[tool.setuptools.packages.find]
+where = ["server"]
+[project.scripts]
+server = "server.app:main"

scripts/validate-submission.sh ADDED Viewed

	@@ -0,0 +1,172 @@

+#!/usr/bin/env bash
+#
+# validate-submission.sh — OpenEnv Submission Validator
+#
+# Checks that your HF Space is live, Docker image builds, and openenv validate passes.
+#
+# Run:
+#   ./scripts/validate-submission.sh <ping_url> [repo_dir]
+#
+# Arguments:
+#   ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)
+#   repo_dir   Path to your repo (default: current directory)
+#
+set -uo pipefail
+DOCKER_BUILD_TIMEOUT=600
+if [ -t 1 ]; then
+  RED='\033[0;31m'
+  GREEN='\033[0;32m'
+  YELLOW='\033[1;33m'
+  BOLD='\033[1m'
+  NC='\033[0m'
+else
+  RED='' GREEN='' YELLOW='' BOLD='' NC=''
+fi
+run_with_timeout() {
+  local secs="$1"; shift
+  if command -v timeout &>/dev/null; then
+    timeout "$secs" "$@"
+  elif command -v gtimeout &>/dev/null; then
+    gtimeout "$secs" "$@"
+  else
+    "$@" &
+    local pid=$!
+    ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
+    local watcher=$!
+    wait "$pid" 2>/dev/null
+    local rc=$?
+    kill "$watcher" 2>/dev/null
+    wait "$watcher" 2>/dev/null
+    return $rc
+  fi
+}
+portable_mktemp() {
+  local prefix="${1:-validate}"
+  mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
+}
+CLEANUP_FILES=()
+cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
+trap cleanup EXIT
+PING_URL="${1:-}"
+REPO_DIR="${2:-.}"
+if [ -z "$PING_URL" ]; then
+  printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
+  printf "\n"
+  printf "  ping_url   Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
+  printf "  repo_dir   Path to your repo (default: current directory)\n"
+  exit 1
+fi
+if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
+  printf "Error: directory '%s' not found\n" "${2:-.}"
+  exit 1
+fi
+PING_URL="${PING_URL%/}"
+export PING_URL
+PASS=0
+log()  { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
+pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
+fail() { log "${RED}FAILED${NC} -- $1"; }
+hint() { printf "  ${YELLOW}Hint:${NC} %b\n" "$1"; }
+stop_at() {
+  printf "\n"
+  printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
+  exit 1
+}
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${BOLD}  OpenEnv Submission Validator${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+log "Repo:     $REPO_DIR"
+log "Ping URL: $PING_URL"
+printf "\n"
+log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
+CURL_OUTPUT=$(portable_mktemp "validate-curl")
+CLEANUP_FILES+=("$CURL_OUTPUT")
+HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
+  -H "Content-Type: application/json" -d '{}' \
+  "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
+if [ "$HTTP_CODE" = "200" ]; then
+  pass "HF Space is live and responds to /reset"
+elif [ "$HTTP_CODE" = "000" ]; then
+  fail "HF Space not reachable (connection failed or timed out)"
+  hint "Check your network connection and that the Space is running."
+  hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
+  stop_at "Step 1"
+else
+  fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
+  hint "Make sure your Space is running and the URL is correct."
+  hint "Try opening $PING_URL in your browser first."
+  stop_at "Step 1"
+fi
+log "${BOLD}Step 2/3: Running docker build${NC} ..."
+if ! command -v docker &>/dev/null; then
+  fail "docker command not found"
+  hint "Install Docker: https://docs.docker.com/get-docker/"
+  stop_at "Step 2"
+fi
+if [ -f "$REPO_DIR/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR"
+elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
+  DOCKER_CONTEXT="$REPO_DIR/server"
+else
+  fail "No Dockerfile found in repo root or server/ directory"
+  stop_at "Step 2"
+fi
+log "  Found Dockerfile in $DOCKER_CONTEXT"
+BUILD_OK=false
+BUILD_OUTPUT=$(run_with_timeout "$DOCKER_BUILD_TIMEOUT" docker build "$DOCKER_CONTEXT" 2>&1) && BUILD_OK=true
+if [ "$BUILD_OK" = true ]; then
+  pass "Docker build succeeded"
+else
+  fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
+  printf "%s\n" "$BUILD_OUTPUT" | tail -20
+  stop_at "Step 2"
+fi
+log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
+if ! command -v openenv &>/dev/null; then
+  fail "openenv command not found"
+  hint "Install it: pip install openenv-core"
+  stop_at "Step 3"
+fi
+VALIDATE_OK=false
+VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
+if [ "$VALIDATE_OK" = true ]; then
+  pass "openenv validate passed"
+  [ -n "$VALIDATE_OUTPUT" ] && log "  $VALIDATE_OUTPUT"
+else
+  fail "openenv validate failed"
+  printf "%s\n" "$VALIDATE_OUTPUT"
+  stop_at "Step 3"
+fi
+printf "\n"
+printf "${BOLD}========================================${NC}\n"
+printf "${GREEN}${BOLD}  All 3/3 checks passed!${NC}\n"
+printf "${GREEN}${BOLD}  Your submission is ready to submit.${NC}\n"
+printf "${BOLD}========================================${NC}\n"
+printf "\n"
+exit 0

server/__init__.py ADDED Viewed

File without changes

server/app.py ADDED Viewed

	@@ -0,0 +1,92 @@

+from openenv.core.env_server import create_app
+from server.inventory_env import InventoryEnvironment
+from server.grader import grade, compute_baselines
+from server.constants import TASKS
+from models import InventoryAction, InventoryObservation
+app = create_app(InventoryEnvironment, InventoryAction, InventoryObservation, env_name="inventory_env")
+@app.get("/tasks")
+def list_tasks():
+    """List available tasks with full schemas."""
+    task_list = []
+    for name, config in TASKS.items():
+        demand = {p: list(v) for p, v in config["base_demand"].items()}
+        task_list.append({
+            "task_name": name,
+            "seed": config["seed"],
+            "max_days": config["max_days"],
+            "initial_cash": config["initial_cash"],
+            "initial_stock": config["initial_stock"],
+            "inventory_capacity": config["inventory_capacity"],
+            "base_demand": demand,
+            "events": config["events"],
+        })
+    return {"tasks": task_list}
+@app.post("/grader")
+def grader_endpoint(task_name: str, agent_profit: float):
+    """Return the evaluation score for an episode."""
+    if task_name not in TASKS:
+        return {"error": f"Unknown task: {task_name}. Available: {list(TASKS.keys())}"}
+    floor, ceiling = compute_baselines(task_name)
+    score = grade(task_name, agent_profit)
+    return {
+        "task_name": task_name,
+        "agent_profit": agent_profit,
+        "floor": floor,
+        "ceiling": ceiling,
+        "score": score,
+    }
+@app.get("/baseline")
+def baseline_endpoint(task_name: str = "easy"):
+    """Run baseline inference on a task and return score."""
+    import subprocess
+    import os
+    import re
+    if task_name not in TASKS:
+        return {"error": f"Unknown task: {task_name}. Available: {list(TASKS.keys())}"}
+    env = os.environ.copy()
+    env["TASK_NAME"] = task_name
+    try:
+        result = subprocess.run(
+            ["python", "inference.py"],
+            capture_output=True,
+            text=True,
+            timeout=1200,
+            env=env,
+        )
+        output = result.stdout
+        # parse score from output
+        score = None
+        for line in output.splitlines():
+            if task_name + ":" in line and "profit" in line:
+                score_match = re.search(r"(\d+\.\d+)\s*\(profit", line)
+                if score_match:
+                    score = float(score_match.group(1))
+        return {
+            "task_name": task_name,
+            "score": score,
+        }
+    except subprocess.TimeoutExpired:
+        return {"error": "Inference timed out (20 min limit)"}
+    except Exception as e:
+        return {"error": str(e)}
+def main():
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/constants.py ADDED Viewed

	@@ -0,0 +1,186 @@

+INITIAL_CASH = 1000.0
+# Product name -> base price (selling price before multiplier)
+BASE_PRICES = {
+    "electronics": 150.0,
+    "clothing": 40.0,
+    "groceries": 10.0,
+    "furniture": 200.0,
+    "toys": 25.0,
+}
+# Product name -> cost price (what you pay to buy stock)
+COST_PRICES = {
+    "electronics": 100.0,
+    "clothing": 25.0,
+    "groceries": 5.0,
+    "furniture": 130.0,
+    "toys": 12.0,
+}
+# Product name -> shelf life in days (None = no expiry)
+SHELF_LIFE = {
+    "electronics": None,
+    "clothing": None,
+    "groceries": 5,
+    "furniture": None,
+    "toys": None,
+}
+# Product name -> starting stock quantity
+INITIAL_STOCK = {
+    "electronics": 10,
+    "clothing": 20,
+    "groceries": 50,
+    "furniture": 5,
+    "toys": 30,
+}
+# Delivery method -> cost per unit
+SHIPPING_COST = {
+    "slow": 2.0,
+    "medium": 5.0,
+    "fast": 10.0,
+}
+# Delivery method -> days to arrive
+SHIPPING_DAYS = {
+    "slow": 5,
+    "medium": 3,
+    "fast": 1,
+}
+# Event name -> days until event (spread across 30 days)
+EVENTS = {
+    "black_friday": 6,
+    "christmas": 12,
+    "back_to_school": 18,
+    "summer_clearance": 24,
+    "new_competitor": 28,
+}
+# Product name -> max inventory space (units)
+INVENTORY_CAPACITY = {
+    "electronics": 100,
+    "clothing": 200,
+    "groceries": 500,
+    "furniture": 50,
+    "toys": 300,
+}
+# Product name -> additional cost per unit for extra inventory beyond capacity
+EXTRA_INVENTORY_COST = {
+    "electronics": 20.0,
+    "clothing": 5.0,
+    "groceries": 2.0,
+    "furniture": 30.0,
+    "toys": 4.0,
+}
+# Product name -> (min_demand, max_demand) per day
+BASE_DEMAND = {
+    "electronics": (3, 8),
+    "clothing": (5, 15),
+    "groceries": (20, 40),
+    "furniture": (1, 3),
+    "toys": (5, 12),
+}
+WEEKEND_MULTIPLIER = 1.2
+# Event name -> {product: demand_multiplier} when event triggers
+EVENT_EFFECTS = {
+    "black_friday": {"electronics": 3.0, "clothing": 2.5, "toys": 2.0, "furniture": 1.5, "groceries": 1.0},
+    "christmas": {"toys": 3.0, "electronics": 2.0, "clothing": 1.5, "furniture": 1.0, "groceries": 1.5},
+    "back_to_school": {"clothing": 2.5, "electronics": 1.5, "toys": 1.5, "furniture": 1.0, "groceries": 1.0},
+    "summer_clearance": {"clothing": 2.0, "toys": 1.5, "electronics": 1.0, "furniture": 1.5, "groceries": 1.0},
+    "new_competitor": {"electronics": 0.6, "clothing": 0.7, "toys": 0.7, "furniture": 0.8, "groceries": 0.9},
+}
+EVENT_DURATION = 2
+MAX_DAYS = 30
+UPGRADE_DELIVERY_COST = 50.0
+# Task configs for easy/medium/hard
+TASKS = {
+    # Easy: High starting stock, low demand, no events, full warehouse capacity.
+    # Agent just needs to maintain stock and sell. Minimal challenge.
+    "easy": {
+        "seed": 100,
+        "max_days": 30,
+        "initial_cash": 1000.0,
+        "events": {},  # no events
+        "initial_stock": {
+            "electronics": 5,
+            "clothing": 10,
+            "groceries": 20,
+            "furniture": 3,
+            "toys": 10,
+        },
+        "inventory_capacity": INVENTORY_CAPACITY,
+        "base_demand": {
+            "electronics": (2, 5),
+            "clothing": (3, 10),
+            "groceries": (15, 30),
+            "furniture": (1, 2),
+            "toys": (3, 8),
+        },
+    },
+    # Medium: Default stock/cash, all 5 events spread across 30 days, normal demand.
+    # Agent must anticipate demand spikes from events and restock accordingly.
+    "medium": {
+        "seed": 200,
+        "max_days": 30,
+        "initial_cash": 1000.0,
+        "events": EVENTS,
+        "initial_stock": INITIAL_STOCK,
+        "inventory_capacity": INVENTORY_CAPACITY,
+        "base_demand": BASE_DEMAND,
+    },
+    # Hard: Half starting cash ($500), low stock, events packed close together,
+    # higher demand, smaller warehouse. Agent must balance tight budget,
+    # overlapping event spikes, and fast-expiring groceries.
+    "hard": {
+        "seed": 300,
+        "max_days": 30,
+        "initial_cash": 500.0,
+        "events": {
+            "black_friday": 4,
+            "christmas": 8,
+            "back_to_school": 12,
+            "summer_clearance": 16,
+            "new_competitor": 20,
+        },
+        "initial_stock": {
+            "electronics": 5,
+            "clothing": 10,
+            "groceries": 30,
+            "furniture": 3,
+            "toys": 15,
+        },
+        "inventory_capacity": {
+            "electronics": 50,
+            "clothing": 100,
+            "groceries": 250,
+            "furniture": 25,
+            "toys": 150,
+        },
+        "base_demand": {
+            "electronics": (5, 12),
+            "clothing": (8, 20),
+            "groceries": (30, 60),
+            "furniture": (2, 5),
+            "toys": (8, 18),
+        },
+    },
+}
+PRICE_ELASTICITY = {
+    "electronics": 1.2,
+    "clothing":    1.5,
+    "groceries":   0.4,
+    "furniture":   0.8,
+    "toys":        1.3,
+}

server/grader.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+Grader for inventory optimization tasks.
+Scores agent performance on a 0.0-1.0 scale using floor/ceiling approach.
+  - floor: passive agent (no buys, just sells initial stock until empty)
+  - ceiling: theoretical max profit with perfect demand knowledge
+"""
+from server.inventory_env import InventoryEnvironment
+from models import InventoryAction
+from server.constants import (
+    TASKS, BASE_PRICES, COST_PRICES, SHIPPING_COST, EVENT_EFFECTS,
+    WEEKEND_MULTIPLIER, EVENT_DURATION,
+)
+import random
+def _run_passive(task_name):
+    """Floor baseline: do nothing, just sell whatever initial stock covers."""
+    env = InventoryEnvironment(task_name)
+    obs = env.reset()
+    while not obs.done:
+        action = InventoryAction(
+            buy_quantities={},
+            delivery_method="slow",
+            liquidate={},
+        )
+        obs = env.step(action)
+    return obs.total_profit
+def _run_heuristic(task_name):
+    task = TASKS[task_name]
+    events = dict(task["events"])
+    total_demand = {p: 0 for p in task["base_demand"]}
+    for day in range(1, task["max_days"] + 1):
+        # tick events
+        for event_name in events:
+            events[event_name] -= 1
+        rng = random.Random(task["seed"] * 1000 + day)
+        for product, (lo, hi) in task["base_demand"].items():
+            demand = rng.randint(lo, hi)
+            # weekend boost
+            if day % 7 == 5 or day % 7 == 6:
+                demand = int(WEEKEND_MULTIPLIER * demand)
+            # event multipliers
+            for event_name, days_left in events.items():
+                if -EVENT_DURATION < days_left <= 0 and event_name in EVENT_EFFECTS:
+                    mult = EVENT_EFFECTS[event_name].get(product, 1.0)
+                    demand = int(demand * mult)
+            total_demand[product] += demand
+    total_profit = 0.0
+    # sell the initial stock first
+    initial_stock = task["initial_stock"]
+    for product in task["base_demand"]:
+        total_profit += min(initial_stock.get(product, 0), total_demand[product]) * BASE_PRICES[product]
+        total_demand[product] = max(0, total_demand[product] - initial_stock.get(product, 0))
+        # cost price and shipping cost applies after initial stock
+        total_profit += total_demand[product] * (BASE_PRICES[product] - COST_PRICES[product] - SHIPPING_COST["slow"])
+    return total_profit
+def compute_baselines(task_name):
+    """Pre-compute floor and ceiling for a task."""
+    floor = _run_passive(task_name)
+    ceiling = _run_heuristic(task_name)
+    return floor, ceiling
+def grade(task_name, agent_profit):
+    """
+    Grade agent performance on 0.0-1.0 scale.
+    Args:
+        task_name: "easy", "medium", or "hard"
+        agent_profit: total profit achieved by the agent
+    Returns:
+        float score between 0.0 and 1.0
+    """
+    floor, ceiling = compute_baselines(task_name)
+    if ceiling <= floor:
+        return 1.0 if agent_profit >= ceiling else 0.0
+    score = (agent_profit - floor) / (ceiling - floor)
+    return max(0.002, min(0.998, score))
+def grade_all(results):
+    """
+    Grade all 3 tasks.
+    Args:
+        results: dict of {task_name: agent_profit}
+    Returns:
+        dict of {task_name: score}
+    """
+    scores = {}
+    for task_name, agent_profit in results.items():
+        scores[task_name] = grade(task_name, agent_profit)
+    return scores
+if __name__ == "__main__":
+    print("Computing baselines for all tasks...")
+    for task_name in ["easy", "medium", "hard"]:
+        floor, ceiling = compute_baselines(task_name)
+        print(f"  {task_name}: floor={floor:.2f}, ceiling={ceiling:.2f}")

server/inventory_env.py ADDED Viewed

	@@ -0,0 +1,264 @@

+from openenv.core.env_server.interfaces import Environment
+import copy
+import random
+from uuid import uuid4
+from models import InventoryAction, InventoryObservation, InventoryState
+from .constants import (
+    INITIAL_CASH, BASE_PRICES, COST_PRICES, SHELF_LIFE, INITIAL_STOCK,
+    EVENTS, SHIPPING_COST, SHIPPING_DAYS, INVENTORY_CAPACITY,
+    EXTRA_INVENTORY_COST, BASE_DEMAND, WEEKEND_MULTIPLIER, EVENT_EFFECTS,
+    EVENT_DURATION, MAX_DAYS, UPGRADE_DELIVERY_COST, TASKS, PRICE_ELASTICITY
+)
+def _build_inventory(stock):
+    """Convert stock dict to batch format: {product: [[qty, days_left], ...]}"""
+    inv = {}
+    for product, qty in stock.items():
+        shelf = SHELF_LIFE[product]
+        inv[product] = [[qty, shelf]]
+    return inv
+class InventoryEnvironment(Environment):
+    def __init__(self, task_name="medium"):
+        self.task_name = task_name
+        self.task = TASKS[task_name]
+        self.cash = self.task["initial_cash"]
+        self.inventory = _build_inventory(self.task["initial_stock"])
+        self.events = copy.deepcopy(self.task["events"])
+        self.deliveries = []
+        self.current_day = 0
+        self.total_profit = 0.0
+        self.seed = self.task["seed"]
+        self.reward = 0.0
+        self.max_days = self.task["max_days"]
+        self.inventory_capacity = self.task["inventory_capacity"]
+        self.base_demand = self.task["base_demand"]
+        self.reset()
+    def reset(self, seed: int = None) -> InventoryObservation:
+        if seed is not None:
+            self.seed = seed
+        else:
+            self.seed = self.task["seed"]
+        self.cash = self.task["initial_cash"]
+        self.inventory = _build_inventory(self.task["initial_stock"])
+        self.events = copy.deepcopy(self.task["events"])
+        self.deliveries = []
+        self.current_day = 0
+        self.total_profit = 0.0
+        self.reward = 0.0
+        self._state = InventoryState(
+            episode_id = str(uuid4()),
+            current_day = 0,
+            cash = self.task["initial_cash"],
+            inventory = dict(self.task["initial_stock"])
+        )
+        return InventoryObservation(
+            current_day = 0,
+            total_cash = self.cash,
+            day_profit = 0.0,
+            total_profit = 0.0,
+            demand_today = {},
+            updated_inventory = copy.deepcopy(self.inventory),
+            remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
+            updated_events = copy.deepcopy(self.events),
+            updated_deliveries = [],
+            reward = 0.0,
+            done = False,
+        )
+    def step(self, action: InventoryAction) -> InventoryObservation:
+        self.current_day += 1
+        self.reward = 0.0  # reset reward each step
+        day_cost = 0.0
+        day_revenue = 0.0
+        # 1. tick event countdowns (keep ticking into negative to track active duration)
+        for event_name in self.events:
+            self.events[event_name] -= 1
+        # 2. remove expired groceries
+        new_batches = []
+        expired_groceries_count = 0
+        for batch in self.inventory["groceries"]:
+            if batch[1] == 0:
+                expired_groceries_count += batch[0]
+                continue
+            else:
+                new_batches.append([batch[0], batch[1] - 1])
+        self.inventory["groceries"] = new_batches
+        self.reward -= 0.05 * expired_groceries_count
+        # 3. Handle incoming deliveries
+        remaining_deliveries = []
+        for delivery in self.deliveries:
+            for product, shipment in delivery.items():
+                qty, arrival_day = shipment
+                if arrival_day <= self.current_day:
+                    self.inventory[product].append([qty, SHELF_LIFE[product]])
+                else:
+                    remaining_deliveries.append(delivery)
+        self.deliveries = remaining_deliveries
+        # 4. process purchases
+        for product, qty in action.buy_quantities.items():
+            unit_cost = COST_PRICES[product] + SHIPPING_COST[action.delivery_method]
+            total_cost = qty * unit_cost
+            # capacity overage cost
+            current_qty = sum(b[0] for b in self.inventory[product])
+            overage = max(0, (current_qty + qty) - self.inventory_capacity[product])
+            extra_cost = overage * EXTRA_INVENTORY_COST[product]
+            total_cost += extra_cost
+            if total_cost > self.cash:
+                self.reward -= 0.5  # penalize for ordering what you can't afford
+                continue
+            self.cash -= total_cost
+            day_cost += total_cost
+            arrival_day = self.current_day + SHIPPING_DAYS[action.delivery_method]
+            # add jitter: slow ±2 days, medium ±1 day, fast is reliable
+            jitter_rng = random.Random(self.seed * 2000 + self.current_day * 100 + hash(product))
+            if action.delivery_method == "slow":
+                arrival_day += jitter_rng.randint(-2, 2)
+            elif action.delivery_method == "medium":
+                arrival_day += jitter_rng.randint(-1, 1)
+            # ensure arrival is at least next day
+            arrival_day = max(self.current_day + 1, arrival_day)
+            self.deliveries.append({product: [qty, arrival_day]})
+        # 5. generate demand
+        demand = self._generate_demand()
+        # apply price elasticity: demand scales with price^(-elasticity)
+        price_mults = {}
+        for product in demand:
+            pm = max(0.5, min(1.5, action.price_multipliers.get(product, 1.0)))
+            price_mults[product] = pm
+            e = PRICE_ELASTICITY[product]
+            demand[product] = max(0, int(demand[product] * pm ** -e))
+        # 6. sell products (fifo)
+        for product, demand_today in demand.items():
+            sell_price = BASE_PRICES[product] * price_mults[product]
+            product_availability = sum(batch[0] for batch in self.inventory[product])
+            if demand_today > product_availability:
+                missed_sales = demand_today - product_availability
+                sold = product_availability
+                day_revenue += sold * sell_price
+                self.inventory[product] = []
+                self.reward -= missed_sales * sell_price * 0.001
+                self.reward += sold * sell_price * 0.001
+            else:
+                day_revenue += demand_today * sell_price
+                self.reward += demand_today * sell_price * 0.001
+                new_batches = []
+                for batch in self.inventory[product]:
+                    if batch[0] < demand_today:
+                        demand_today = demand_today - batch[0]
+                    elif demand_today == 0:
+                        new_batches.append(batch)
+                    else:
+                        remaining = batch[0] - demand_today
+                        if remaining > 0:
+                            new_batches.append([remaining, batch[1]])
+                        demand_today = 0
+                self.inventory[product] = new_batches
+        # 7. Liquidate some stock (FIFO, no revenue)
+        total_liquidation_loss = 0.0
+        for product, count in action.liquidate.items():
+            if product not in self.inventory or count <= 0:
+                continue
+            actually_removed = min(count, sum(b[0] for b in self.inventory[product]))
+            total_liquidation_loss += actually_removed * COST_PRICES[product]
+            remaining = count
+            new_batches = []
+            for batch in self.inventory[product]:
+                if remaining <= 0:
+                    new_batches.append(batch)
+                elif batch[0] <= remaining:
+                    remaining -= batch[0]
+                else:
+                    new_batches.append([batch[0] - remaining, batch[1]])
+                    remaining = 0
+            self.inventory[product] = new_batches
+        self.reward -= total_liquidation_loss * 0.001
+        # compute day profit
+        day_profit = day_revenue - day_cost
+        self.cash += day_revenue
+        self.total_profit += day_profit
+        # check done
+        done = self.current_day >= self.max_days
+        # update state
+        self._state = InventoryState(
+            episode_id = self._state.episode_id,
+            current_day = self.current_day,
+            cash = self.cash,
+            inventory = {p: sum(b[0] for b in self.inventory[p]) for p in self.inventory},
+        )
+        return InventoryObservation(
+            current_day = self.current_day,
+            total_cash = self.cash,
+            day_profit = day_profit,
+            total_profit = self.total_profit,
+            demand_today = demand,
+            updated_inventory = copy.deepcopy(self.inventory),
+            remaining_capacity = {p: max(0, self.inventory_capacity[p] - sum(b[0] for b in self.inventory[p])) for p in self.inventory},
+            updated_events = copy.deepcopy(self.events),
+            updated_deliveries = copy.deepcopy(self.deliveries),
+            reward = self.reward,
+            done = done,
+        )
+    def _generate_demand(self):
+        rng = random.Random(self.seed * 1000 + self.current_day)
+        demand = {}
+        for product, (lo, hi) in self.base_demand.items():
+            demand[product] = rng.randint(lo, hi)
+        # weekend boost
+        if self.current_day % 7 in (5, 6):
+            for product in demand:
+                demand[product] = int(demand[product] * WEEKEND_MULTIPLIER)
+        # active event multipliers (only for EVENT_DURATION days after triggering)
+        for event_name, days in self.events.items():
+            if -EVENT_DURATION < days <= 0 and event_name in EVENT_EFFECTS:
+                for product, mult in EVENT_EFFECTS[event_name].items():
+                    demand[product] = int(demand[product] * mult)
+        return demand
+    @property
+    def state(self) -> InventoryState:
+        return self._state

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff