text-adventure-agent

Sleeping

App Files Files Community

NA commited on Feb 22

Commit

83513fc

1 Parent(s): fb1ba55

Update submission_template agent and README

Browse files

Files changed (4) hide show

submission_template/.gitignore +1 -0
submission_template/README.md +23 -4
submission_template/agent.py +442 -171
submission_template/mcp_server.py +142 -155

submission_template/.gitignore CHANGED Viewed

@@ -1,3 +1,4 @@
 # Python
 __pycache__/
 *.py[cod]

 # Python
 __pycache__/
 *.py[cod]

submission_template/README.md CHANGED Viewed

@@ -18,11 +18,30 @@ This is my submission for the Text Adventure Agent assignment. My agent uses the
 ## Approach
-<!-- Describe your approach here -->
-- What strategy does your agent use?
-- What tools did you implement in your MCP server?
-- Any interesting techniques or optimizations?
 ## Files

 ## Approach
+I kept for the MCP server the provided `example_submission` implementation: it exposes the game through a small set of MCP tools (`play_action`, `memory`, `get_map`, `inventory`) and maintains lightweight server-side state (recent history + explored connections) to make tool outputs informative for the agent. My main contributions are therefore on the **agent side** (`agent.py`).
+My agent still follows the same ReAct contract (THOUGHT / TOOL / ARGS), but I added **game-agnostic mechanisms** to reduce wasted steps and improve exploration robustness across different games:
+1) **More robust parsing of THOUGHT/TOOL/ARGS**
+   The agent parses the LLM output with regex and JSON fallbacks (e.g., salvage `"action": "..."` if JSON is slightly malformed), and defaults to `play_action({"action": "look"})` when needed.
+2) **Action normalization**
+   I normalize actions (e.g., strip quotes, remove “go/walk/move …”, trim punctuation, collapse whitespace) to reduce invalid commands caused by natural language phrasing.
+3) **Local “do-not-repeat-failures” memory**
+   The agent maintains a small set of `(location, action)` pairs that produced clear failure signals (e.g., “you can’t”, “not allowed”, “unknown word”, “securely anchored”, etc.). If the same action is proposed again in the same location, the agent deterministically substitutes a different direction.
+4) **Light loop prevention**
+   The agent tracks recent executed actions. If it detects immediate repetition, it forces a different action (typically another direction). This prevents common failure modes like repeatedly opening an already-open container or bouncing between two states.
+5) **Best-effort map (agent-side)**
+   Independently of the server map, I also keep a minimal map by extracting room names from the first line when it looks like a location title. When a movement action changes the room name, I store `room --direction--> room` plus inferred reverse links (north↔south, etc.). This map is included in the prompt to guide exploration.
+Overall, these additions mainly reduce wasted steps (bad commands, repeated failures) and make exploration more stable across games. The agent still struggles when object names are ambiguous/repeated, and with longer action chains that require multi-step planning.
 ## Files

submission_template/agent.py CHANGED Viewed

@@ -27,7 +27,7 @@ import json
 import os
 import re
 from dataclasses import dataclass, field
-from typing import Optional
 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
@@ -38,57 +38,27 @@ load_dotenv()
 # =============================================================================
 # LLM Configuration - DO NOT MODIFY
 # =============================================================================
-# Model to use (fixed for fair evaluation)
 LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
-# Initialize the LLM client (uses HF_TOKEN from environment)
-_hf_token = os.getenv("HF_TOKEN")
-if not _hf_token:
-    raise ValueError("HF_TOKEN not found. Set it in your .env file.")
-LLM_CLIENT = InferenceClient(token=_hf_token)
 def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
-    """
-    Call the LLM with the given prompt. Use this function in your agent.
-    Args:
-        prompt: The user prompt (current game state, history, etc.)
-        system_prompt: The system prompt (instructions for the agent)
-        seed: Random seed for reproducibility
-        max_tokens: Maximum tokens in response (default: 300)
-    Returns:
-        The LLM's response text
-    Example:
-        response = call_llm(
-            prompt="You are in a forest. What do you do?",
-            system_prompt=SYSTEM_PROMPT,
-            seed=42,
-        )
-    """
-    messages = [
-        {"role": "system", "content": system_prompt},
-        {"role": "user", "content": prompt},
-    ]
     response = LLM_CLIENT.chat.completions.create(
         model=LLM_MODEL,
         messages=messages,
-        temperature=0.0,  # Deterministic for reproducibility
         max_tokens=max_tokens,
         seed=seed,
     )
     return response.choices[0].message.content
 @dataclass
 class RunResult:
-    """Result of running the agent. Do not modify this class."""
     final_score: int
     max_score: int
     moves: int
@@ -99,181 +69,482 @@ class RunResult:
 # =============================================================================
-# System Prompt - Customize this for your agent
 # =============================================================================
 SYSTEM_PROMPT = """You are playing a classic text adventure game.
 GOAL: Explore the world, solve puzzles, and maximize your score.
 AVAILABLE TOOLS (use via MCP):
 - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
-- memory: Get current game state and history (if implemented)
-- inventory: Check what you're carrying (if implemented)
 VALID GAME COMMANDS for play_action:
-- Movement: north, south, east, west, up, down, enter, exit
-- Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>
-- Other: look, inventory, read <thing>, turn on lamp
 RESPOND IN THIS EXACT FORMAT (no markdown):
 THOUGHT: <your reasoning about what to do next>
 TOOL: <tool_name>
 ARGS: <JSON arguments, e.g., {"action": "look"}>
-Example:
-THOUGHT: I should look around to see where I am.
-TOOL: play_action
-ARGS: {"action": "look"}
 """
 # =============================================================================
-# Student Agent - IMPLEMENT THIS CLASS
 # =============================================================================
-class StudentAgent:
-    """
-    Your ReAct agent implementation.
-    TODO:
-    1. Implement the run() method with the ReAct loop
-    2. Parse LLM responses to extract tool calls
-    3. Track state and avoid loops
-    Use the provided call_llm() function to interact with the LLM.
-    """
     def __init__(self):
-        """Initialize your agent here."""
-        # TODO: Initialize any state tracking you need
-        # self.history = []
-        # self.visited_locations = set()
-        pass
-    async def run(
-        self,
-        client,  # FastMCP Client connected to your MCP server
-        game: str,
-        max_steps: int,
-        seed: int,
-        verbose: bool = False,
-    ) -> RunResult:
-        """
-        Run the agent for a game session.
-        Args:
-            client: FastMCP Client connected to your MCP server
-            game: Name of the game being played (e.g., "zork1")
-            max_steps: Maximum number of steps to take
-            seed: Random seed for reproducibility (use for LLM calls)
-            verbose: Whether to print detailed output
-        Returns:
-            RunResult with final score and statistics
-        """
-        # TODO: Implement your ReAct loop here
-        #
-        # Basic structure:
-        # 1. Get initial observation (call play_action with "look")
-        # 2. Loop for max_steps:
-        #    a. Build prompt with current observation and history
-        #    b. Call LLM to get thought and action
-        #    c. Parse the response to extract tool and args
-        #    d. Call the tool via client.call_tool(tool_name, args)
-        #    e. Update history and state
-        #    f. Check for game over
-        # 3. Return RunResult with final statistics
-        # Example of calling a tool:
-        # result = await client.call_tool("play_action", {"action": "look"})
-        # observation = result[0].text if result else "No response"
-        # Example of calling the LLM:
-        # response = call_llm(
-        #     prompt="Current observation: " + observation,
-        #     system_prompt=SYSTEM_PROMPT,
-        #     seed=seed,
-        # )
-        # Placeholder implementation - replace with your code
-        locations_visited = set()
-        history = []
-        final_score = 0
         moves = 0
-        # TODO: Your implementation here
-        # ...
         return RunResult(
-            final_score=final_score,
-            max_score=350,  # Zork1 max score, adjust if needed
             moves=moves,
-            locations_visited=locations_visited,
-            game_completed=False,
-            history=history,
         )
-    def _build_prompt(self, observation: str, history: list) -> str:
-        """
-        Build the prompt for the LLM.
-        TODO: Implement this to create effective prompts
-        """
-        # TODO: Combine system prompt, history, and current observation
-        pass
-    def _parse_response(self, response: str) -> tuple[str, str, dict]:
-        """
-        Parse LLM response to extract thought, tool name, and arguments.
-        TODO: Implement robust parsing
-        Returns:
-            Tuple of (thought, tool_name, args_dict)
-        """
-        # TODO: Parse the response format:
-        # THOUGHT: ...
-        # TOOL: ...
-        # ARGS: {...}
-        pass
-    def _call_llm(self, prompt: str, system_prompt: str, seed: int) -> str:
-        """
-        Call the LLM with the given prompt.
-        This is a convenience wrapper - you can also use call_llm() directly.
-        """
-        return call_llm(prompt, system_prompt, seed)
 # =============================================================================
-# For local testing
 # =============================================================================
 async def test_agent():
-    """Test the agent locally."""
     from fastmcp import Client
-    # Path to your MCP server
     server_path = "mcp_server.py"
     agent = StudentAgent()
     async with Client(server_path) as client:
-        result = await agent.run(
-            client=client,
-            game="zork1",
-            max_steps=10,
-            seed=42,
-            verbose=True,
-        )
-        print(f"\nFinal Score: {result.final_score}")
-        print(f"Moves: {result.moves}")
-        print(f"Locations: {result.locations_visited}")
 if __name__ == "__main__":
     import asyncio
-    asyncio.run(test_agent())

 import os
 import re
 from dataclasses import dataclass, field
+from typing import Optional, Dict, Tuple, Set, List
 from dotenv import load_dotenv
 from huggingface_hub import InferenceClient
 # =============================================================================
 # LLM Configuration - DO NOT MODIFY
 # =============================================================================
 LLM_MODEL = "Qwen/Qwen2.5-72B-Instruct"
+HF_TOKEN = os.getenv("HF_TOKEN")
+LLM_CLIENT = InferenceClient(token=HF_TOKEN) if HF_TOKEN else None
 def call_llm(prompt: str, system_prompt: str, seed: int, max_tokens: int = 300) -> str:
+    if LLM_CLIENT is None:
+        raise RuntimeError("HF_TOKEN missing (set it as env var / HF secret).")
+    messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}]
     response = LLM_CLIENT.chat.completions.create(
         model=LLM_MODEL,
         messages=messages,
+        temperature=0.0,
         max_tokens=max_tokens,
         seed=seed,
     )
     return response.choices[0].message.content
 @dataclass
 class RunResult:
     final_score: int
     max_score: int
     moves: int
 # =============================================================================
+# System Prompt
 # =============================================================================
 SYSTEM_PROMPT = """You are playing a classic text adventure game.
 GOAL: Explore the world, solve puzzles, and maximize your score.
 AVAILABLE TOOLS (use via MCP):
 - play_action: Execute a game command (north, take lamp, open mailbox, etc.)
+- memory: Get current game state and history
+- inventory: Check what you're carrying
+- get_map: Get a map of explored locations and their connections
+- get_valid_actions: (if available) get likely valid actions
+CRITICAL RULES TO AVOID LOOPS:
+1. Avoid repeating the same action as the immediately previous action.
+2. If an action fails in a location, do NOT retry that same action in that same location.
+3. Prioritize unexplored directions and new interactions over repeating old ones.
 VALID GAME COMMANDS for play_action:
+- Movement: north, south, east, west, up, down, enter, exit, in, out, northeast, northwest, southeast, southwest
+- Objects: take <item>, drop <item>, open <thing>, close <thing>, examine <thing>, read <thing>
+- Other: look, inventory, wait, turn on <item>, turn off <item>, push <thing>, pull <thing>, move <thing>, climb <thing>
 RESPOND IN THIS EXACT FORMAT (no markdown):
 THOUGHT: <your reasoning about what to do next>
 TOOL: <tool_name>
 ARGS: <JSON arguments, e.g., {"action": "look"}>
 """
 # =============================================================================
+# Parsing / heuristics
 # =============================================================================
+SCORE_MOVES_RE = re.compile(r"\[Score:\s*(\d+)\s*\|\s*Moves:\s*(\d+)\s*\]", re.IGNORECASE)
+MOVE_WORDS = {
+    "north", "south", "east", "west",
+    "up", "down", "enter", "exit", "in", "out",
+    "northeast", "northwest", "southeast", "southwest",
+    "n", "s", "e", "w", "ne", "nw", "se", "sw", "u", "d",
+}
+REVERSE_DIR = {
+    "north": "south", "south": "north",
+    "east": "west", "west": "east",
+    "up": "down", "down": "up",
+    "enter": "exit", "exit": "enter",
+    "in": "out", "out": "in",
+    "northeast": "southwest", "southwest": "northeast",
+    "northwest": "southeast", "southeast": "northwest",
+    "n": "s", "s": "n", "e": "w", "w": "e",
+    "ne": "sw", "sw": "ne", "nw": "se", "se": "nw",
+    "u": "d", "d": "u",
+}
+ROOM_SKIP_PREFIXES = (
+    "Taken", "Opening", "You can't", "I don't", "WELCOME",
+    "There's nothing", "Score:", "[Score:", "It is already",
+    "The door is", "You are carrying", "Dropped",
+)
+HARD_FAIL_MARKERS = (
+    "you can't", "you cannot", "not allowed", "no way",
+    "i don't understand", "unknown word",
+    "can't see any", "not here",
+    "blocked", "impenetrable",
+    "securely anchored", "too heavy", "fixed", "attached",
+    "you already have", "it is already open",
+)
+USELESS_MARKERS = (
+    "it is already open", "you already have that", "you have that already", "nothing happens",
+)
+def strip_code_fences(s: str) -> str:
+    s = s.strip()
+    s = re.sub(r"^```json\s*", "", s, flags=re.IGNORECASE).strip()
+    s = re.sub(r"^```\s*", "", s).strip()
+    s = re.sub(r"\s*```$", "", s).strip()
+    return s
+def parse_score(observation: str) -> Optional[int]:
+    m = re.search(r"[Ss]core[:\s]+(\d+)", observation or "")
+    return int(m.group(1)) if m else None
+def is_game_over(observation: str) -> bool:
+    patterns = [
+        r"\*\*\*\s*You have died\s*\*\*\*",
+        r"\*\*\*\s*You have won\s*\*\*\*",
+        r"game over",
+        r"The End",
+        r"RESTART, RESTORE, or QUIT",
+    ]
+    return any(re.search(p, observation or "", re.IGNORECASE) for p in patterns)
+def sanitize_play_action(action: str) -> str:
+    """Normalize common phrasing to reduce invalid commands."""
+    if not action:
+        return "look"
+    a = action.strip()
+    if (a.startswith('"') and a.endswith('"')) or (a.startswith("'") and a.endswith("'")):
+        a = a[1:-1].strip()
+    al = a.lower().strip()
+    for pref in ("go ", "walk ", "move ", "head ", "run "):
+        if al.startswith(pref):
+            a = a[len(pref):].strip()
+            al = a.lower().strip()
+            break
+    a = a.rstrip(".!?").strip()
+    a = re.sub(r"\s+", " ", a).strip()
+    return a if a else "look"
+def extract_location_name(observation: str) -> Optional[str]:
+    """Best-effort: use first line if it looks like a room title (same logic as before, but slightly safer)."""
+    if not observation:
+        return None
+    first_line = observation.strip().splitlines()[0].strip() if observation.strip().splitlines() else ""
+    if not first_line:
+        return None
+    for p in ROOM_SKIP_PREFIXES:
+        if p in first_line:
+            return None
+    if len(first_line) < 50 and first_line[0].isupper():
+        return first_line
+    return None
+def looks_failed(observation: str) -> bool:
+    txt = (observation or "").lower()
+    return any(m in txt for m in HARD_FAIL_MARKERS)
+def looks_useless(observation: str) -> bool:
+    txt = (observation or "").lower()
+    return any(m in txt for m in USELESS_MARKERS)
+# =============================================================================
+# Core agent state
+# =============================================================================
+@dataclass
+class AgentState:
+    # Interaction history:
+    trace: List[Tuple[str, str, str]] = field(default_factory=list)
+    # Location tracking
+    current_location: Optional[str] = None
+    previous_location: Optional[str] = None
+    visited_locations: Set[str] = field(default_factory=set)
+    topo_map: Dict[str, Dict[str, str]] = field(default_factory=dict)
+    # Current score
+    score: int = 0
+    # Tool outputs captured separately
+    last_tool_info: str = ""
+    # Loop guards
+    last_executed_action: Optional[str] = None
+    recent_actions: List[str] = field(default_factory=list)
+    # Local failure memory: (location, normalized_action) -> True
+    banned_actions_by_location: Set[Tuple[str, str]] = field(default_factory=set)
+    last_memory_step: int = -999
+    last_map_step: int = -999
+    last_inventory_step: int = -999
+# =============================================================================
+# Agent
+# =============================================================================
+class StudentAgent:
     def __init__(self):
+        self.state = AgentState()
+    def _parse_llm_toolcall(self, response: str) -> Tuple[str, str, Dict]:
+        """Parse THOUGHT/TOOL/ARGS. Keep exactly your contract but more robust."""
+        thought = ""
+        tool = ""
+        args: Dict = {}
+        thought_match = re.search(r"THOUGHT:\s*(.+?)(?=TOOL:|$)", response, re.DOTALL | re.IGNORECASE)
+        if thought_match:
+            thought = thought_match.group(1).strip()
+        tool_match = re.search(r"TOOL:\s*([A-Za-z_][A-Za-z0-9_]*)", response, re.IGNORECASE)
+        if tool_match:
+            tool = tool_match.group(1).strip()
+        args_match = re.search(r"ARGS:\s*(\{.*\})", response, re.DOTALL | re.IGNORECASE)
+        if args_match:
+            raw = args_match.group(1).strip()
+            try:
+                args = json.loads(raw)
+            except json.JSONDecodeError:
+                # fallback
+                action_match = re.search(r'"action"\s*:\s*"([^"]+)"', raw)
+                if action_match:
+                    args = {"action": action_match.group(1)}
+        # fallback default
+        if not tool:
+            tool = "play_action"
+            args = {"action": "look"}
+        elif tool == "play_action" and (not args or "action" not in args):
+            args = {"action": "look"}
+        return thought, tool, args
+    def _format_recent_history(self, k: int = 5) -> str:
+        if not self.state.trace:
+            return "(no history)"
+        recent = self.state.trace[-k:]
+        out = []
+        for th, act, obs in recent:
+            short_obs = obs[:200] + "..." if len(obs) > 200 else obs
+            out.append(f"- Action: {act}\n  Result: {short_obs}")
+        return "\n".join(out)
+    def _format_map(self) -> str:
+        if not self.state.topo_map:
+            return "Empty map."
+        lines = []
+        for room, edges in self.state.topo_map.items():
+            marker = " <=== YOU ARE HERE" if room == self.state.current_location else ""
+            if edges:
+                conn = ", ".join([f"[{d} -> {dst}]" for d, dst in edges.items()])
+            else:
+                conn = "No explored exits"
+            lines.append(f"- {room}{marker}: {conn}")
+        return "\n".join(lines)
+    def _build_llm_prompt(self, observation: str, loop_warning: str) -> str:
+        loc = self.state.current_location or "Unknown"
+        tool_info = self.state.last_tool_info.strip() if self.state.last_tool_info else "(none)"
+        prompt = f"""{loop_warning}
+CURRENT LOCATION: {loc}
+CURRENT OBSERVATION:
+{observation}
+LAST TOOL INFO (memory/get_map/inventory):
+{tool_info}
+KNOWN MAP (Spatial Graph):
+{self._format_map()}
+RECENT HISTORY:
+{self._format_recent_history(5)}
+CURRENT SCORE: {self.state.score}
+Decide your next step and output THOUGHT/TOOL/ARGS.
+"""
+        return prompt
+    def _update_score(self, observation: str) -> None:
+        s = parse_score(observation)
+        if s is not None:
+            self.state.score = s
+    def _update_location_and_map(self, observation: str, action_taken: Optional[str]) -> None:
+        new_loc = extract_location_name(observation)
+        if not new_loc:
+            return
+        if new_loc not in self.state.topo_map:
+            self.state.topo_map[new_loc] = {}
+        if self.state.current_location and action_taken:
+            move = action_taken.lower().strip()
+            if move in MOVE_WORDS and self.state.current_location != new_loc:
+                # forward
+                self.state.topo_map.setdefault(self.state.current_location, {})
+                self.state.topo_map[self.state.current_location][move] = new_loc
+                # reverse
+                rev = REVERSE_DIR.get(move)
+                if rev:
+                    self.state.topo_map[new_loc][rev] = self.state.current_location
+        self.state.previous_location = self.state.current_location
+        self.state.current_location = new_loc
+        self.state.visited_locations.add(new_loc)
+    def _loop_warning_text(self) -> str:
+        recent = self.state.recent_actions[-4:]
+        if len(recent) >= 4 and recent[-1] in recent[:-1]:
+            return "\n!!! WARNING: LOOP DETECTED !!!\nChoose a different action than your last few actions.\n"
+        return ""
+    def _is_banned_here(self, action_norm: str) -> bool:
+        loc = self.state.current_location or "Unknown"
+        return (loc, action_norm) in self.state.banned_actions_by_location
+    def _ban_here(self, action_norm: str) -> None:
+        loc = self.state.current_location or "Unknown"
+        self.state.banned_actions_by_location.add((loc, action_norm))
+    def _force_alternative_move(self, avoid: str) -> Optional[str]:
+        """Pick a reasonable alternative direction."""
+        loc = self.state.current_location or "Unknown"
+        for d in ["north", "east", "south", "west", "up", "down", "enter", "exit"]:
+            if d == avoid:
+                continue
+            if (loc, d) not in self.state.banned_actions_by_location:
+                return d
+        return None
+    def _maybe_guard_play_action(self, thought: str, args: Dict) -> Tuple[str, Dict]:
+        """Apply minimal deterministic guards while keeping behavior close to your original."""
+        action_raw = str(args.get("action", "look"))
+        action_norm = sanitize_play_action(action_raw).lower()
+        if self.state.last_executed_action and action_norm == self.state.last_executed_action:
+            alt = self._force_alternative_move(avoid=action_norm)
+            if alt:
+                return f"(Guard) Avoid repeating '{action_norm}'. Trying '{alt}'.", {"action": alt}
+        if self._is_banned_here(action_norm):
+            alt = self._force_alternative_move(avoid=action_norm)
+            if alt:
+                return f"(Guard) '{action_norm}' failed here before. Trying '{alt}'.", {"action": alt}
+        return thought, {"action": action_norm}
+    # Tool calling
+    async def _call_tool(self, client, tool: str, args: Dict) -> str:
+        result = await client.call_tool(tool, args or {})
+        return result.content[0].text if result else "No response"
+    async def run(self, client, game: str, max_steps: int, seed: int, verbose: bool = True) -> RunResult:
+        self.state = AgentState()
         moves = 0
+        game_completed = False
+        error: Optional[str] = None
+        try:
+            observation = await self._call_tool(client, "play_action", {"action": "look"})
+            self._update_score(observation)
+            self._update_location_and_map(observation, action_taken=None)
+            if verbose:
+                print(f"=== Initial Observation ===\n{observation}\n")
+            for step in range(max_steps):
+                moves = step + 1
+                if step - self.state.last_memory_step >= 6:
+                    try:
+                        self.state.last_tool_info = await self._call_tool(client, "memory", {})
+                        self.state.last_memory_step = step
+                    except Exception:
+                        pass
+                elif step - self.state.last_map_step >= 10:
+                    try:
+                        self.state.last_tool_info = await self._call_tool(client, "get_map", {})
+                        self.state.last_map_step = step
+                    except Exception:
+                        pass
+                loop_warning = self._loop_warning_text()
+                prompt = self._build_llm_prompt(observation, loop_warning)
+                llm_response = call_llm(prompt, SYSTEM_PROMPT, seed)
+                thought, tool, args = self._parse_llm_toolcall(llm_response)
+                if verbose:
+                    print(f"=== Step {step + 1} ===")
+                    print(f"LLM Response:\n{llm_response}\n")
+                if tool == "play_action":
+                    thought, args = self._maybe_guard_play_action(thought, args)
+                tool_output = ""
+                executed_action_label = tool
+                try:
+                    tool_output = await self._call_tool(client, tool, args)
+                    if tool == "play_action":
+                        observation = tool_output
+                        executed_action_label = args.get("action", "look")
+                        self.state.last_executed_action = str(executed_action_label).lower().strip()
+                        self.state.recent_actions.append(self.state.last_executed_action)
+                        self.state.recent_actions = self.state.recent_actions[-8:]  # cap
+                    else:
+                        self.state.last_tool_info = f"[Tool: {tool}]\n{tool_output}"
+                except Exception as e:
+                    if tool == "play_action":
+                        observation = f"Error executing tool: {e}"
+                        executed_action_label = f"{tool} (error)"
+                    else:
+                        self.state.last_tool_info = f"[Tool: {tool} ERROR]\n{e}"
+                if tool == "play_action":
+                    self._update_score(observation)
+                    self._update_location_and_map(observation, action_taken=str(executed_action_label))
+                    act_norm = str(executed_action_label).lower().strip()
+                    if looks_failed(observation) or looks_useless(observation):
+                        self._ban_here(act_norm)
+                self.state.trace.append((thought, str(executed_action_label), observation))
+                if verbose:
+                    print(f"Thought: {thought}")
+                    print(f"Tool: {tool}")
+                    if tool == "play_action":
+                        print(f"Action: {executed_action_label}")
+                        print(f"Observation:\n{observation[:350]}\n")
+                    else:
+                        print(f"Tool output (stored):\n{tool_output[:250]}\n")
+                if tool == "play_action" and is_game_over(observation):
+                    game_completed = True
+                    break
+        except Exception as e:
+            error = str(e)
+            if verbose:
+                print(f"Error: {error}")
+        if verbose:
+            print(f"=== Final Stats ===")
+            print(f"Score: {self.state.score}")
+            print(f"Moves: {moves}")
+            print(f"Locations visited: {len(self.state.visited_locations)}")
         return RunResult(
+            final_score=self.state.score,
+            max_score=350,
             moves=moves,
+            locations_visited=set(self.state.visited_locations),
+            game_completed=game_completed,
+            error=error,
+            history=[(t, a, o) for (t, a, o) in self.state.trace],
         )
 # =============================================================================
+# Local testing
 # =============================================================================
 async def test_agent():
     from fastmcp import Client
     server_path = "mcp_server.py"
     agent = StudentAgent()
     async with Client(server_path) as client:
+        res = await agent.run(client=client, game="zork1", max_steps=10, seed=42, verbose=True)
+        print(f"\nFinal Score: {res.final_score}")
+        print(f"Moves: {res.moves}")
+        print(f"Locations: {res.locations_visited}")
 if __name__ == "__main__":
     import asyncio
+    asyncio.run(test_agent())

submission_template/mcp_server.py CHANGED Viewed

@@ -1,27 +1,8 @@
 """
-Student MCP Server for Text Adventure Games
-This is your MCP server submission. Implement the tools that your agent
-will use to play text adventure games.
-Required tool:
-    play_action(action: str) -> str
-        Execute a game command and return the result.
-Recommended tools:
-    memory() -> str
-        Return current game state, score, and recent history.
-    inventory() -> str
-        Return the player's current inventory.
-    get_map() -> str
-        Return a map of explored locations.
-Test your server with:
-    fastmcp dev submission_template/mcp_server.py
-Then open the MCP Inspector in your browser to test the tools interactively.
 """
 import sys
@@ -31,179 +12,185 @@ import os
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastmcp import FastMCP
-from games.zork_env import TextAdventureEnv
-# =============================================================================
-# Create the MCP Server
-# =============================================================================
-mcp = FastMCP("Student Text Adventure Server")
-# =============================================================================
-# Game State Management
-# =============================================================================
-class GameManager:
-    """
-    Manages the text adventure game state.
-    TODO: Extend this class to track:
-    - Action history (for memory tool)
-    - Explored locations (for mapping)
-    - Current score and moves
-    """
-    def __init__(self):
-        self.env: TextAdventureEnv = None
-        self.state = None
-        self.game_name: str = ""
-        # TODO: Add more state tracking
-        # self.history: list[tuple[str, str]] = []
-        # self.explored_locations: dict[str, set[str]] = {}
-        # self.current_location: str = ""
-    def initialize(self, game: str = "zork1"):
-        """Initialize or reset the game."""
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
-        # TODO: Reset your state tracking here
-        return self.state.observation
-    def step(self, action: str) -> str:
-        """Execute an action and return the result."""
-        if self.env is None:
-            self.initialize()
         self.state = self.env.step(action)
-        # TODO: Update your state tracking here
-        # self.history.append((action, self.state.observation))
-        # Update location tracking, etc.
-        return self.state.observation
-    def get_score(self) -> int:
-        """Get current score."""
-        return self.state.score if self.state else 0
-    def get_moves(self) -> int:
-        """Get number of moves taken."""
-        return self.state.moves if self.state else 0
-# Global game manager
-_game = GameManager()
-def get_game() -> GameManager:
-    """Get or initialize the game manager."""
-    global _game
-    if _game.env is None:
-        # Get game from environment variable (set by evaluator)
-        game = os.environ.get("GAME", "zork1")
-        _game.initialize(game)
-    return _game
 # =============================================================================
-# MCP Tools - IMPLEMENT THESE
 # =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
-    Execute a game command and return the result.
-    This is the main tool for interacting with the game.
     Args:
-        action: The command to execute (e.g., "north", "take lamp", "open mailbox")
     Returns:
-        The game's response to the action
-    Valid commands include:
-        - Movement: north, south, east, west, up, down, enter, exit
-        - Objects: take <item>, drop <item>, open <thing>, examine <thing>
-        - Other: look, inventory, read <thing>, turn on lamp
     """
     game = get_game()
-    # TODO: You might want to add action validation here
-    # TODO: You might want to include score changes in the response
-    result = game.step(action)
-    # Optional: Append score info
-    # result += f"\n[Score: {game.get_score()} | Moves: {game.get_moves()}]"
-    return result
-# TODO: Implement additional tools to help your agent
-# @mcp.tool()
-# def memory() -> str:
-#     """
-#     Get the current game state summary.
-#
-#     Returns:
-#         A summary including current location, score, moves, and recent history
-#     """
-#     game = get_game()
-#     # TODO: Return useful state information
-#     pass
-# @mcp.tool()
-# def inventory() -> str:
-#     """
-#     Check what the player is carrying.
-#
-#     Returns:
-#         List of items in the player's inventory
-#     """
-#     game = get_game()
-#     result = game.step("inventory")
-#     return result
-# @mcp.tool()
-# def get_map() -> str:
-#     """
-#     Get a map of explored locations.
-#
-#     Returns:
-#         A text representation of explored locations and connections
-#     """
-#     game = get_game()
-#     # TODO: Return map of explored locations
-#     pass
-# @mcp.tool()
-# def get_valid_actions() -> str:
-#     """
-#     Get a list of likely valid actions from the current location.
-#
-#     Returns:
-#         List of actions that might work here
-#     """
-#     # This is a hint: Jericho provides get_valid_actions()
-#     game = get_game()
-#     if game.env and game.env.env:
-#         valid = game.env.env.get_valid_actions()
-#         return "Valid actions: " + ", ".join(valid[:20])
-#     return "Could not determine valid actions"
 # =============================================================================
-# Run the server
 # =============================================================================
 if __name__ == "__main__":
-    # This runs the server with stdio transport (for MCP clients)
     mcp.run()

 """
+Example: MCP Server for Text Adventures
+A complete MCP server that exposes text adventure games via tools.
+This demonstrates a full-featured server with memory, mapping, and inventory.
 """
 import sys
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from fastmcp import FastMCP
+from games.zork_env import TextAdventureEnv, list_available_games
+# Get game from environment variable (default: zork1)
+INITIAL_GAME = os.environ.get("GAME", "zork1")
+# Create the MCP server
+mcp = FastMCP("Text Adventure Server")
+class GameState:
+    """Manages the text adventure game state and exploration data."""
+    def __init__(self, game: str = "zork1"):
         self.game_name = game
         self.env = TextAdventureEnv(game)
         self.state = self.env.reset()
+        self.history: list[tuple[str, str]] = []
+        self.explored_locations: dict[str, set[str]] = {}
+        self.current_location: str = self._extract_location(self.state.observation)
+    def _extract_location(self, observation: str) -> str:
+        """Extract location name from observation (usually first line)."""
+        lines = observation.strip().split('\n')
+        return lines[0] if lines else "Unknown"
+    def take_action(self, action: str) -> str:
+        """Execute a game action and return the result."""
         self.state = self.env.step(action)
+        result = self.state.observation
+        # Track history
+        self.history.append((action, result))
+        if len(self.history) > 50:
+            self.history = self.history[-50:]
+        # Update map
+        new_location = self._extract_location(result)
+        if action in ["north", "south", "east", "west", "up", "down",
+                      "enter", "exit", "n", "s", "e", "w", "u", "d"]:
+            if self.current_location not in self.explored_locations:
+                self.explored_locations[self.current_location] = set()
+            if new_location != self.current_location:
+                self.explored_locations[self.current_location].add(f"{action} -> {new_location}")
+        self.current_location = new_location
+        return result
+    def get_memory(self) -> str:
+        """Get a summary of current game state."""
+        recent = self.history[-5:] if self.history else []
+        recent_str = "\n".join([f"  > {a} -> {r[:60]}..." for a, r in recent]) if recent else "  (none yet)"
+        return f"""Current State:
+- Location: {self.current_location}
+- Score: {self.state.score} points
+- Moves: {self.state.moves}
+- Game: {self.game_name}
+Recent Actions:
+{recent_str}
+Current Observation:
+{self.state.observation}"""
+    def get_map(self) -> str:
+        """Get a map of explored locations."""
+        if not self.explored_locations:
+            return "Map: No locations explored yet. Try moving around!"
+        lines = ["Explored Locations and Exits:"]
+        for loc, exits in sorted(self.explored_locations.items()):
+            lines.append(f"\n* {loc}")
+            for exit_info in sorted(exits):
+                lines.append(f"    -> {exit_info}")
+        lines.append(f"\n[Current] {self.current_location}")
+        return "\n".join(lines)
+    def get_inventory(self) -> str:
+        """Get current inventory."""
+        items = self.state.inventory if hasattr(self.state, 'inventory') and self.state.inventory else []
+        if not items:
+            return "Inventory: You are empty-handed."
+        item_names = []
+        for item in items:
+            item_str = str(item)
+            item_lower = item_str.lower()
+            if "parent" in item_lower:
+                idx = item_lower.index("parent")
+                name = item_str[:idx].strip()
+                if ":" in name:
+                    name = name.split(":", 1)[1].strip()
+                item_names.append(name)
+            elif ":" in item_str:
+                name = item_str.split(":")[1].strip()
+                item_names.append(name)
+            else:
+                item_names.append(item_str)
+        return f"Inventory: {', '.join(item_names)}"
+# Global game state
+_game_state: GameState | None = None
+def get_game() -> GameState:
+    """Get or initialize the game state."""
+    global _game_state
+    if _game_state is None:
+        _game_state = GameState(INITIAL_GAME)
+    return _game_state
 # =============================================================================
+# MCP Tools
 # =============================================================================
 @mcp.tool()
 def play_action(action: str) -> str:
     """
+    Execute a game action in the text adventure.
     Args:
+        action: The command to execute (e.g., 'north', 'take lamp', 'open mailbox')
     Returns:
+        The game's response to your action
     """
     game = get_game()
+    result = game.take_action(action)
+    # Add score info
+    score_info = f"\n\n[Score: {game.state.score} | Moves: {game.state.moves}]"
+    if game.state.reward > 0:
+        score_info = f"\n\n+{game.state.reward} points! (Total: {game.state.score})"
+    done_info = ""
+    if game.state.done:
+        done_info = "\n\nGAME OVER"
+    return result + score_info + done_info
+@mcp.tool()
+def memory() -> str:
+    """
+    Get a summary of the current game state.
+    Returns location, score, moves, recent actions, and current observation.
+    """
+    return get_game().get_memory()
+@mcp.tool()
+def get_map() -> str:
+    """
+    Get a map showing explored locations and connections.
+    Useful for navigation and avoiding getting lost.
+    """
+    return get_game().get_map()
+@mcp.tool()
+def inventory() -> str:
+    """
+    Check what items you are currently carrying.
+    """
+    return get_game().get_inventory()
 # =============================================================================
+# Main
 # =============================================================================
 if __name__ == "__main__":
     mcp.run()