# vendors.md — Mock Vendor API Subsystem **Module root:** `driftcall/vendors/` (6 files: `base.py`, `airline.py`, `cab.py`, `restaurant.py`, `hotel.py`, `payment.py`) **Owner:** Person A (Environment) **Implements:** DESIGN.md §5 (all subsections §5.1–§5.5), §4.3 (tool dispatch in `step()`), §6 (as mutation target via `drift_injector`), §7.1 R1 / R3 (constraint checking inputs) **Status:** Design spec — pre-critic-gate --- ## 1. Purpose The vendor subsystem is the five pure-Python **mock consumer APIs** that DriftCall's agent interacts with every time it emits a `TOOL_CALL` action. They are the *world* that drifts: their schemas, policies, T&Cs, pricing, and (transversally) auth tokens mutate mid-episode under the drift injector's control, and every behavioral signal the agent learns from flows out of their `ToolResult` returns. The subsystem serves four consumers: 1. **`env.step`** (DESIGN.md §4.3) — dispatches each `TOOL_CALL` to a vendor tool handler and receives a `(ToolResult, VendorState)` pair (from `models.md` §4.3). The env installs the returned `VendorState` into `DriftCallState.vendor_states[domain]` whether or not it differs from the input — the returned-state contract is uniform. 2. **`drift_injector`** (drift_injector.md §3.4, §6) — writes to `DriftCallState.vendor_states[domain]` via the `apply_schema_mutation` helper; vendors read from the same dict using whichever `schema_versions[domain]` is currently installed. 3. **`rewards`** (DESIGN.md §7.1) — consumes `ToolResult` history embedded in the episode trail to compute R1 (task completion — was a booking created?), R3 (constraint adherence — did the booking satisfy budget/time/dietary?), and R4 (format compliance — was the `response` shape legal?). 4. **`audio/asr_whisper`** and `audio/tts_kokoro` — are *independent* of the vendor layer. Audio converts user utterances ↔ text at the env boundary (DESIGN.md §9); vendors never touch audio. **Mutation authority.** Two and only two code paths commit changes to a `VendorState`: - **Drift path** — `drift_injector.apply_drift` → vendor-module `apply_schema_mutation(state, mutation) → state'`. Schemas, policies, T&Cs, pricing, auth shift under this path. - **Commit path** — `dispatch(...) → (ToolResult, state')`. Write tools (`*.book`, `*.order`, `*.cancel`, `payment.charge`, `payment.refund`, `payment.get_token`) commit new records into `state'.bookings / .orders / .rides / .charges / .accepted_token_version`. Non-write tools (`*.search`, `*.estimate`, `*.get_booking`, `*.track`) return the same state by identity (`returned_state is input_state`). Both paths are pure functions: new frozen `VendorState` out, never in-place mutation. Any code that does `state.bookings[k] = v` or `state.charges[k] = v` (dict mutation on a field of a frozen `VendorState`) is a contract violation and is caught by the frozen-dataclass equality tests in `tests/test_vendors.py`. Every tool handler is a **deterministic pure function** given `(vendor_state, tool_args, seed)`. A seeded `random.Random` (sourced from `DriftCallState.episode_id` hash) drives any synthetic listing (flight rosters, restaurant menus, hotel inventory) so reset replay is bit-identical. Cites DESIGN.md §5.1 (airline), §5.2 (cab), §5.3 (restaurant), §5.4 (hotel), §5.5 (payment), and §6 drift patterns #1–#20 as enumerated in drift_injector.md §4.4. --- ## 2. Interface Every signature below is the exact target. Additions require a DESIGN.md update first. All tool handlers return a `ToolResult` (from `models.py`, see models.md §4.3). ### 2.1 Common per-vendor module structure Every vendor module (`airline.py`, `cab.py`, `restaurant.py`, `hotel.py`, `payment.py`) exports exactly this surface: ```python from __future__ import annotations # Public tool dispatch — called by env.step # # Primary-domain vendors (airline, cab, restaurant, hotel): 3-tuple signature. # The third element is the post-commit PaymentState because book/order/cancel # handlers invoke payment.charge transactionally and must return the updated # PaymentState so env.step can thread it through the state graph. Returned by # identity if the tool didn't touch payment. def dispatch( tool_name: str, # e.g. "airline.search" (fully-qualified) tool_args: dict[str, Any], vendor_state: VendorState, # frozen dataclass, per-vendor schema_version: str, # "v1" | "v2" | "v3" episode_seed: int, now_ist: datetime, # env-owned clock, IST-tzaware (episode-constant, see §3.5) payment_state: PaymentState, # threaded through for transactional charge/refund ) -> tuple[ToolResult, VendorState, PaymentState]: ... # Returns (result, new_vendor_state, new_payment_state). # For non-write ops (*.search, *.estimate, *.get_booking, *.track): new_vendor_state IS vendor_state # and new_payment_state IS payment_state (identity equality on both). # For write ops (*.book, *.order, *.cancel): new_vendor_state is a fresh frozen VendorState built via # dataclasses.replace(old, records={**old.records, new_id: record}); new_payment_state reflects any # charge/refund committed during the transaction (returned by identity if payment wasn't touched). # In-place mutation of any dict/list field on vendor_state or payment_state is a contract violation. # Payment dispatch: 2-tuple signature — there is no separate primary-domain state to return. def dispatch( # in driftcall/vendors/payment.py tool_name: str, tool_args: dict[str, Any], vendor_state: PaymentState, schema_version: str, episode_seed: int, now_ist: datetime, ) -> tuple[ToolResult, PaymentState]: ... # Note: primary-domain dispatch returns a 3-tuple because book/order-style tools invoke # payment.charge transactionally; env.step threads the updated PaymentState through the state graph. # State bootstrap — called by env.reset() once per episode def initial_state( episode_seed: int, goal: GoalSpec, # only read for domain-local hints ) -> VendorState: ... # Drift mutation helper — called ONLY by drift_injector.apply_drift def apply_schema_mutation( vendor_state: VendorState, mutation: Mapping[str, Any], # operator-keyed dict, per drift_injector §3.4 ) -> VendorState: ... # Introspection (for PROBE_SCHEMA action) def describe_schema( vendor_state: VendorState, schema_version: str, ) -> dict[str, Any]: ... # Side-channel notice emission — called ONCE per env.step BEFORE dispatch def emit_side_channel_if_pending( vendor_state: VendorState, ) -> tuple[str | None, VendorState]: ... # Returns (notice_string_or_None, new_state_with_cleared_channel). # Purpose: consumed-on-read pattern for one-shot T&C / pricing / auth # notices placed on vendor_state.side_channel_notice by drift_injector's # side_channel_notice_append operator (drift_injector.md §3.4). # Semantics: one-shot — if side_channel_notice is set, returns it and # returns a new VendorState with side_channel_notice = None (the clear # is VENDOR-INTERNAL; no drift-injector operator is involved). If nothing # pending, returns (None, vendor_state) unchanged. # env.step calls this once per step before dispatch and attaches the # returned notice to the next ToolResult.response["_notice"] surface. # Tool catalogue — static tuple, used to populate DriftCallObservation.available_tools TOOLS: tuple[str, ...] ``` `VendorState` is a per-module frozen dataclass (§4 below). `dispatch` is a pure function: same inputs → same return tuple, no hidden randomness beyond the seeded RNG derived from `episode_seed + tool_name + tool_args`. **`dispatch`'s signature is fixed per vendor kind: primary-domain vendors (airline, cab, restaurant, hotel) return `tuple[ToolResult, VendorState, PaymentState]` (3-tuple) because their book/order/cancel tools invoke `payment.charge`/`payment.refund` transactionally and must return the updated PaymentState; payment returns `tuple[ToolResult, PaymentState]` (2-tuple) since there is no separate primary-domain state to return.** Commit-path transitions (a new booking/order/ride/charge record) flow through `dispatch`'s returned state; drift-path transitions flow through `apply_schema_mutation`; initial construction flows through `initial_state`; side-channel consumption flows through `emit_side_channel_if_pending`. All four are pure functions returning a new frozen `VendorState`. **Mechanical rule for implementers.** Every dict-field update uses `{**old_dict, key: value}` to construct a new dict, and `dataclasses.replace(state, field=new_dict)` builds the new frozen state. For example, committing a booking: ```python new_bookings = {**vendor_state.bookings, booking_id: record} new_state = dataclasses.replace(vendor_state, bookings=new_bookings) return ToolResult(...), new_state ``` In-place mutation of any state dict (`vendor_state.bookings[k] = v`) is a contract violation and will be caught by the frozen-equality test in `tests/test_vendors.py` (which asserts `id(returned_state.bookings) != id(input_state.bookings)` whenever a write occurred, and `returned_state is input_state` whenever a read occurred). ### 2.2 Airline (`driftcall/vendors/airline.py`) Implements DESIGN.md §5.1. ```python TOOLS: tuple[str, ...] = ( "airline.search", "airline.book", "airline.cancel", "airline.get_booking", ) def airline_search( vendor_state: AirlineState, schema_version: str, from_: str, to: str, date: str, max_price_inr: int | None = None, time_window: Literal["morning", "afternoon", "evening", "late_night"] | None = None, episode_seed: int = 0, ) -> ToolResult: ... def airline_book( vendor_state: AirlineState, schema_version: str, flight_id: str, payment_token: str, passenger_count: int | None = None, # required in v3 passenger_name: str | None = None, episode_seed: int = 0, ) -> ToolResult: ... def airline_cancel( vendor_state: AirlineState, schema_version: str, booking_id: str, episode_seed: int = 0, ) -> ToolResult: ... def airline_get_booking( vendor_state: AirlineState, schema_version: str, booking_id: str, ) -> ToolResult: ... ``` ### 2.3 Cab (`driftcall/vendors/cab.py`) Implements DESIGN.md §5.2. ```python TOOLS: tuple[str, ...] = ( "cab.estimate", "cab.book", "cab.cancel", ) def cab_estimate( vendor_state: CabState, schema_version: str, pickup: str, drop: str, vehicle_class: Literal["mini", "sedan", "suv", "infant_seat_sedan"], pickup_time_ist: str, episode_seed: int = 0, ) -> ToolResult: ... def cab_book( vendor_state: CabState, schema_version: str, pickup: str, drop: str, vehicle_class: str, pickup_time_ist: str, payment_token: str, episode_seed: int = 0, ) -> ToolResult: ... def cab_cancel( vendor_state: CabState, schema_version: str, ride_id: str, episode_seed: int = 0, ) -> ToolResult: ... ``` ### 2.4 Restaurant (`driftcall/vendors/restaurant.py`) Implements DESIGN.md §5.3. ```python TOOLS: tuple[str, ...] = ( "restaurant.search", "restaurant.order", "restaurant.track", ) def restaurant_search( vendor_state: RestaurantState, schema_version: str, city: str, cuisine: str | None = None, veg_only: bool = False, max_price_inr: int | None = None, episode_seed: int = 0, ) -> ToolResult: ... def restaurant_order( vendor_state: RestaurantState, schema_version: str, restaurant_id: str, items: list[dict[str, Any]], # [{dish_id, qty, modifiers?: [...]}] payment_token: str, episode_seed: int = 0, ) -> ToolResult: ... def restaurant_track( vendor_state: RestaurantState, schema_version: str, order_id: str, ) -> ToolResult: ... ``` ### 2.5 Hotel (`driftcall/vendors/hotel.py`) Implements DESIGN.md §5.4. ```python TOOLS: tuple[str, ...] = ( "hotel.search", "hotel.book", "hotel.cancel", ) def hotel_search( vendor_state: HotelState, schema_version: str, city: str, checkin: str, checkout: str, max_nightly_rate_inr: int | None = None, episode_seed: int = 0, ) -> ToolResult: ... def hotel_book( vendor_state: HotelState, schema_version: str, hotel_id: str, checkin: str, checkout: str, payment_token: str, gst_number: str | None = None, # required in v3 if total > 7500 episode_seed: int = 0, ) -> ToolResult: ... def hotel_cancel( vendor_state: HotelState, schema_version: str, booking_id: str, episode_seed: int = 0, ) -> ToolResult: ... ``` ### 2.6 Payment (`driftcall/vendors/payment.py`) Implements DESIGN.md §5.5 — **transversal**; every primary domain's `*_book` / `*_order` handler calls `payment.charge` under the hood. ```python TOOLS: tuple[str, ...] = ( "payment.charge", "payment.refund", "payment.get_token", ) def payment_charge( vendor_state: PaymentState, schema_version: str, amount_inr: int, # integer INR only payment_token: str, mfa_code: str | None = None, # required in v3 if amount > 5000 episode_seed: int = 0, ) -> ToolResult: ... def payment_refund( vendor_state: PaymentState, schema_version: str, charge_id: str, amount_inr: int, episode_seed: int = 0, ) -> ToolResult: ... def payment_get_token( vendor_state: PaymentState, schema_version: str, requested_scope: str, # "payments:write:v1" | "payments:write:v2" episode_seed: int = 0, ) -> ToolResult: ... ``` --- ## 3. Behavior Spec ### 3.1 Determinism Every vendor handler is a pure function of `(vendor_state, schema_version, tool_args, episode_seed, now_ist)`. The `episode_seed` deterministically produces: - Flight roster at `airline.search` (seeded `random.Random(episode_seed ^ hash(("airline.search", from_, to, date)))` — 3 to 8 flights). - Restaurant listings at `restaurant.search` (same pattern). - Hotel inventory at `hotel.search` (same pattern). - `cab.estimate` fare (deterministic function of `(pickup, drop, vehicle_class, episode_seed)` — no noise). - `latency_ms` (seeded uniform sample 50–400 for `ok`, 5000–7000 for `timeout`). **Timeout trigger is deterministic.** A dispatch returns `status="timeout"` iff: ```python def _canonical_args_json(tool_args: dict[str, Any]) -> str: # Stable, sorted, whitespace-free JSON rep of tool_args. return json.dumps(tool_args, sort_keys=True, separators=(",", ":"), ensure_ascii=False) is_timeout = (hash((episode_seed, tool_name, _canonical_args_json(tool_args))) & 0x7F) == 0 ``` This gives ~0.78% rate (1 in 128 dispatches), is bit-identical across replays, and uses the same formula in every vendor. No vendor ever instantiates `random.Random()` for timeout selection — timeouts flow solely from this hash-bit check. The seeded latency RNG is used only to sample the numeric `latency_ms` value after the timeout branch is chosen. Two `dispatch()` calls with identical inputs MUST return equal `(ToolResult, VendorState)` pairs (`ToolResult` equal by `==`; `VendorState` identical by `is` if no commit occurred, else structurally equal by `==`). No global RNG, no wall-clock reads (the env injects `now_ist`), no environment variables. ### 3.2 Schema versioning Each vendor exposes exactly three schema versions — `v1`, `v2`, `v3`. Version transitions are **clean**: the drift injector mutates `vendor_state` via `apply_schema_mutation`, and the *next* tool call reads the mutated state through whichever schema-version branch applies. There is no rollback. Every handler branches on `schema_version` at its top: ```python if schema_version == "v1": return _serialize_v1(...) elif schema_version == "v2": return _serialize_v2(...) elif schema_version == "v3": return _serialize_v3(...) else: raise UnknownSchemaVersionError(schema_version) ``` Transition semantics: - **v1 → v2** (one drift pattern per domain) — field renames, required-field additions, enum expansions, policy bumps. - **v2 → v3** (one drift pattern per domain, where applicable) — second-order changes; typically adds another required field or semantic shift. - Versions advance monotonically within an episode (drift injector never decrements). - Stage-3 episodes can chain `v1 → v2 → v3` on the same domain within 16 turns (drift_injector §7 E7). ### 3.3 ToolResult construction contract Every vendor handler returns a `ToolResult` with exactly these semantics: | Field | Rule | |---|---| | `tool_name` | Echoes the fully-qualified tool name (e.g. `"airline.search"`). MUST match the dispatched name. | | `status` | Exactly one of `"ok"`, `"schema_error"`, `"policy_error"`, `"auth_error"`, `"timeout"` — enumerated in §5. | | `response` | JSON-roundtrip-safe dict. No `set`, no `bytes`, no `tuple`-as-value, no `datetime` objects (serialize to ISO-8601 strings), no non-primitive custom classes. On non-ok status, MUST include `error_code: str` key. | | `schema_version` | Current `schema_versions[domain]` at call time. Echoes even on error. | | `latency_ms` | Seeded sample; `≥ 0`. | The JSON-roundtrip invariant is enforced at test time by `tests/test_vendors.py` running `json.loads(json.dumps(result.response))` on every return path (models.md §5 covers the surface). ### 3.4 Monetary semantics **All amounts are integer INR.** No floats. `budget_inr`, `fare_inr`, `total_with_tax`, `amount_inr` — everywhere `int`. The cab v3 `fare_breakdown` sub-fields (`base`, `surge`, `tolls`, `gst`) are each integer INR and MUST sum to the `total_inr` top-level field. Rounding (when derived from deterministic ratios — e.g. GST at 18% — uses `int(round(x))` with banker's rounding explicitly avoided via Python's `math.floor(x + 0.5)` for reproducibility across platforms). ### 3.5 Clock & IST timezone `now_ist` is **episode-constant**. It is set exactly once at `env.reset()` from a deterministic formula and carried in `DriftCallState` for the life of the episode: ```python # Pinned at env construction (process start): today's date in Asia/Kolkata, truncated to date. BASE_DATE_IST: datetime = datetime.now(ZoneInfo("Asia/Kolkata")).replace(hour=0, minute=0, second=0, microsecond=0) # At env.reset(seed=episode_seed): offset_seconds = (episode_seed * 37) % 86400 now_ist = (BASE_DATE_IST + timedelta(seconds=offset_seconds)).replace(second=0, microsecond=0) # → pinned, minute-truncated, IST-tzaware ``` Properties: - **Deterministic per seed.** Two episodes with the same `episode_seed` produce the same `now_ist`, regardless of wall-clock replay time (within the same env-process lifetime; `BASE_DATE_IST` is pinned at process start). - **Constant across the episode.** Every `dispatch` call in episode `N` receives the same `now_ist` — no advancement per turn, no policy flips from clock drift. - **Minute-truncated.** Seconds and microseconds zeroed, so policy boundaries (14:00, 07:00, 09:00, 12:00) are unambiguous. Vendors use `now_ist` for: - `airline.booking_window_shrink` policy check (same-day after 14:00 IST). - `cab.school_hours_mini_reject` policy (07:00–09:00 IST). - `hotel.early_checkin_tnc` (check-in before 12:00 IST). **Vendors MUST NOT call `datetime.now()`, `time.time()`, or any wall-clock reader.** The sole time source is the `now_ist` parameter threaded from `env.step`. This is enforced by an AST-grep test in `tests/test_vendors.py` that rejects any `datetime.now` / `time.time` / `date.today` name reference in `driftcall/vendors/*.py`. **Interaction with timing drifts.** Drifts that mutate policy timing (e.g., `airline.booking_window_shrink` shrinking the booking-window threshold from 24h to 2h) work by `apply_schema_mutation` changing the threshold field inside `AirlinePolicy` — NOT by `now_ist` crossing a static threshold. Since `now_ist` is constant within an episode, the only way a policy can flip mid-episode is via a drift mutation. This guarantees that a replay from the same seed produces bit-identical policy decisions. ### 3.6 Interaction with drift_injector Per drift_injector.md §3.4, mutation operators (`rename`, `remove`, `require_new_field`, `change_type`, `numeric_bump`, `enum_expand`, `policy_flag_flip`, `time_window_shrink`, `tnc_text_swap`, `side_channel_notice_append`, `pricing_restructure`, `fee_append`, `auth_scope_bump`, `token_version_bump`) are applied via each vendor's `apply_schema_mutation(vendor_state, mutation) -> vendor_state'` helper. The helper is: - A **pure function** (frozen `VendorState` in, new frozen `VendorState` out). - Idempotent per-operator (applying the same mutation twice is a no-op beyond the first — but `drift_injector` itself guards via `DriftReapplicationError`). - Domain-scoped: it only mutates the keys relevant to `domain` (airline mutation never touches hotel state). For T&C / pricing / auth drifts whose operator is `side_channel_notice_append`, `apply_schema_mutation` sets `vendor_state.side_channel_notice` (Optional[str]). The drift injector's job ends there — it only ever **sets** the notice. The **clear** is a pure vendor-internal state transition: `env.step` calls `emit_side_channel_if_pending(vendor_state)` ONCE per step, BEFORE dispatch. That helper returns `(notice_or_None, new_vendor_state_with_cleared_channel)` — the consumed-on-read pattern. If a notice was pending, the env installs the returned new vendor state via `vendor_states[domain] = new_state` and attaches the notice string to the next `ToolResult.response["_notice"]` in that domain. This is the "one-shot notice" resolved in drift_injector.md §9 Q4. **There is no `clear_side_channel` mutation operator.** The 14 operators defined in drift_injector.md §3.4 (`rename`, `remove`, `require_new_field`, `change_type`, `numeric_bump`, `enum_expand`, `policy_flag_flip`, `time_window_shrink`, `tnc_text_swap`, `side_channel_notice_append`, `pricing_restructure`, `fee_append`, `auth_scope_bump`, `token_version_bump`) are the complete set. Clearing the side-channel notice is not a drift — it is the vendor consuming a pending notice during normal read. The drift injector never calls any helper to clear it. **`dispatch` is the commit path; `apply_schema_mutation` is the drift path.** Both are pure functions returning a new frozen `VendorState`. `dispatch` takes `(tool_name, tool_args, vendor_state, schema_version, episode_seed, now_ist)` and returns `(ToolResult, VendorState')`. For read tools (`*.search`, `*.estimate`, `*.get_booking`, `*.track`), the returned state is identical to the input by identity (`returned_state is vendor_state`) — the dispatch reads but does not commit. For write tools (`*.book`, `*.order`, `*.cancel`, `payment.charge`, `payment.refund`, `payment.get_token`), the returned state is a freshly constructed frozen `VendorState` with the commit delta applied via `dataclasses.replace(old, field={**old.field, key: record})`. `dispatch` never mutates `vendor_state` in place. `env.step` threads the returned state back into `DriftCallState.vendor_states[domain]` on every call — the install is uniform; non-write dispatches simply re-install the same object. ### 3.7 Auth & payment cascades Every `*_book` / `*_order` handler that finalizes a transaction internally calls `payment.payment_charge(vendor_state=state.vendor_states["payment"], ...)`, which itself returns `(ToolResult, PaymentState')`. If payment returns `auth_error` (token mismatch, MFA required), the calling handler surfaces that error **upward** — the airline/cab/hotel/restaurant `ToolResult` returns `status="auth_error"` with `response={"error_code": "PAYMENT_AUTH_FAILED", "required_scope": "payments:write:v2"}` (or `"mfa_required": true`), propagating the payment gateway's diagnostic. This is the intended cross-domain cascade (drift_injector.md §7 E5). The caller handler MUST NOT partially commit: if payment fails, no booking is created, no order is placed, and no state transition is recorded in the domain-specific vendor state. Concretely, the caller returns its input `VendorState` by identity (no `dataclasses.replace`), AND it returns the `PaymentState` by identity (the payment handler itself did not commit a new charge on the failure path). `env.step` receives this pair and installs the domain state — both the caller's vendor state and the payment state remain identical to pre-call objects. A cross-domain cascade failure therefore leaves the full `DriftCallState.vendor_states` dict structurally unchanged (same object identities). ### 3.8 Uniqueness of booking/order/ride IDs Generated IDs are deterministic strings: `f"{domain[:3].upper()}-{hash((episode_seed, op, key)) & 0xFFFF:04X}"` where `op` is `"book" | "order" | "ride" | "charge"` and `key` is tool-arg-derived. This gives 4-hex IDs like `AIR-3F2A`, stable per seed. Collisions within an episode are vanishingly rare (< 0.001% across 16-turn episodes). The vendor reads the current record dict (`vendor_state.bookings` / `.orders` / `.rides` / `.charges`) to detect whether the candidate ID is already present BEFORE constructing the commit state — the read is against the input `vendor_state`, and the commit (if any) is applied via `dataclasses.replace` on that same input. **`-R{retry}` is for HASH COLLISIONS ONLY.** The `-R{retry}` suffix is appended when two calls with *different* inputs collide on the same 4-hex prefix (a pure hash accident). It is NOT the mechanism for handling duplicate intent (same caller, same key, same parameters) — duplicate-intent calls are rejected by the idempotency guard in §3.9 with a `DUPLICATE_*` policy error BEFORE the ID generator runs. The two paths are disjoint: idempotency runs first (and may short-circuit with an error result and unchanged state), then ID generation runs (and may append `-R{retry}` if the prefix collides). **Retry counter derivation (deterministic, replayable).** On hash collision — the vendor detects that the candidate 4-hex ID already exists in `vendor_state.bookings` / `.orders` / `.rides` / `.charges` — the vendor appends a `-R{retry}` suffix where `retry` is a per-episode, per-operation monotonic counter derived entirely from replay-stable inputs: ``` retry = 1 + sum( 1 for existing_id in vendor_state. if existing_id.startswith(f"{domain[:3].upper()}-{hash((episode_seed, op, key)) & 0xFFFF:04X}") ) ``` Equivalently: `retry` counts how many already-stored records in the vendor's record dict share the same 4-hex prefix, plus one. Because `vendor_state.` is the deterministic result of all prior tool calls in this episode (each of which was itself a pure function of `(episode_seed, prior_state, tool_args)`), and because the collision-triggering call's `(episode_seed, op, key)` tuple is itself deterministic, `retry` is identical across two runs of the same `episode_seed`. No wall-clock, no global RNG, no process-local counter — the value is reconstructable from state alone. Worked example: seed `1234`, op `"book"`, key derived from `"6E-2345"` hashes to `3F2A`. Prior state has `AIR-3F2A` already present (from a *different* flight whose key happened to hash to the same prefix). Next call with the same hash prefix → scan finds one prefix match → `retry = 1 + 1 = 2` → ID becomes `AIR-3F2A-R2`. Replaying the episode from seed `1234` reconstructs the identical prior state at the identical turn, finds the same one prefix match, and computes the same `retry = 2`. The ID stream is bit-identical across runs. Tests assert no first-level collisions for the curated seed set (so `-R{retry}` never fires on the canonical seeds); replay determinism for the `-R{retry}` path is covered by a dedicated stress seed that intentionally collides. ### 3.9 Duplicate-intent idempotency (write-tool guard) Every write tool runs an **idempotency check** before constructing an ID or committing. If an existing record in `vendor_state.` has the same idempotency key as the incoming request, the tool returns a `policy_error` and **does not** commit — the returned `VendorState` is the input by identity. Idempotency key per domain (all fields normalized: trimmed whitespace, lowercased where free-text, sorted where list-like): | Tool | Idempotency key | Error code | |---|---|---| | `airline.book` | `(flight_id, passenger_name, depart_date)` | `DUPLICATE_BOOKING` | | `hotel.book` | `(hotel_id, checkin, checkout, primary_guest)` | `DUPLICATE_BOOKING` | | `cab.book` | `(pickup, drop, depart_time, vehicle_class)` | `DUPLICATE_RIDE` | | `restaurant.order` | `(restaurant_id, normalized_items_sorted)` | `DUPLICATE_ORDER` | | `payment.charge` | `(order_ref, amount_inr, token_scope)` | `DUPLICATE_CHARGE` | Where `normalized_items_sorted = tuple(sorted((item["dish_id"], item["qty"], tuple(sorted(item.get("modifiers", [])))) for item in items))`. **Error envelope on duplicate:** ```python ToolResult( tool_name=, status="policy_error", response={ "error_code": "DUPLICATE_BOOKING", # or DUPLICATE_RIDE / _ORDER / _CHARGE "existing_id": "AIR-3F2A", # the ID of the record that already satisfies this intent "original_ts": "2026-04-25T18:32:00+05:30", # now_ist at the time the original record was created, ISO-8601 "hint": "an identical booking already exists; cancel the existing one to rebook", }, schema_version=, latency_ms=, ) ``` **What the original_ts field tracks.** When a write tool commits a record, the record stores `"created_at_ist": now_ist.isoformat()` inside its dict. The idempotency guard reads this back as `original_ts` when rejecting a duplicate. Because `now_ist` is episode-constant (§3.5), `original_ts` equals the `now_ist` the episode was pinned to at `reset()` — not wall-clock time. **Order of checks in a write handler** (mandatory): 1. Schema validation (required fields, types) → `schema_error` on fail. 2. Policy validation (min order, booking window, enum, GST gating) → `policy_error` on fail. 3. **Idempotency check** (this section) → `policy_error` with `DUPLICATE_*` on fail. Returns `(result, input_state)` — no commit. 4. Auth cascade (payment subcall) → propagates `auth_error` upward on fail. Returns `(result, input_state)` — no commit on either domain. 5. ID generation with `-R{retry}` for hash collisions (§3.8). 6. Commit via `dataclasses.replace` — construct new state and return `(result, new_state)`. The idempotency guard and the `-R{retry}` mechanism are **disjoint**: duplicate-intent rejects before ID generation; `-R{retry}` only fires when *different* inputs collide on the same 4-hex prefix. --- ## 4. Data Structures ### 4.1 `AirlineState` (frozen dataclass) ```python @dataclass(frozen=True) class AirlineState: schema_version: str # "v1" | "v2" | "v3" bookings: dict[str, dict[str, Any]] # booking_id → booking dict (current-version shape) flight_roster_cache: dict[str, tuple[dict[str, Any], ...]] # search_key → flights tuple policy: AirlinePolicy # nested frozen policy (booking_window, etc.) tnc: AirlineTnC # nested frozen T&C text pricing: AirlinePricing # convenience_fee, etc. side_channel_notice: str | None # set by drift_injector; attached once ``` #### Schema field tables **v1 (baseline, DESIGN.md §5.1):** | Field | Type | Example | Notes | |---|---|---|---| | `flight_id` | str | `"6E-2345"` | Indigo-style code | | `from` | str (IATA) | `"HYD"` | Origin | | `to` | str (IATA) | `"BLR"` | Destination | | `depart` | str (ISO-8601 IST) | `"2026-04-25T18:30:00+05:30"` | Timezone-aware | | `price` | int | `7200` | Integer INR | | `currency` | str | `"INR"` | Redundant, removed in v2 | | `seats_left` | int | `14` | | **v2 (after `airline.price_rename`, DESIGN.md §5.1):** | Field | Type | Example | Notes | |---|---|---|---| | `flight_id` | str | `"6E-2345"` | unchanged | | `from` / `to` | str | unchanged | | | `depart` | str | unchanged | | | `total_fare_inr` | int | `7200` | **was `price`** | | `seats_left` | int | unchanged | | | *(`currency` removed)* | — | — | | **v3 (after `airline.pax_required`, DESIGN.md §5.1):** | Field | Type | Example | Notes | |---|---|---|---| | *(all v2 fields)* | — | — | | | `passenger_count` | int | `1` | **New required field on `airline.book`.** Search responses may include occupancy, but book calls now 4xx without it. | ### 4.2 `CabState` (frozen) ```python @dataclass(frozen=True) class CabState: schema_version: str rides: dict[str, dict[str, Any]] policy: CabPolicy # mini_reject_school_hours flag, vehicle_class_enum pricing: CabPricing # base_per_km, surge_factor, toll_bundled tnc: CabTnC side_channel_notice: str | None ``` **v1:** | Field | Type | Example | |---|---|---| | `pickup` | str | `"HYD airport T1"` | | `drop` | str | `"Banjara Hills"` | | `vehicle_class` | Literal | `"mini" | "sedan"` | | `fare_inr` | int | `320` | | `eta_min` | int | `7` | **v2 (after `cab.vehicle_class_expand` OR `cab.school_hours_mini_reject`):** | Field | Type | Example | |---|---|---| | `vehicle_class` | Literal | `"mini" | "sedan" | "suv" | "infant_seat_sedan"` | | `fare_inr` | int | unchanged | | policy | — | `mini` during 07:00–09:00 IST → `policy_error` | | (other fields unchanged) | — | — | **v3 (after `cab.fare_breakdown`):** | Field | Type | Example | Notes | |---|---|---|---| | `pickup`, `drop`, `vehicle_class`, `eta_min` | — | unchanged | | | ~~`fare_inr`~~ | — | **removed** | Replaced by breakdown | | `fare_breakdown` | dict | `{"base": 240, "surge": 40, "tolls": 20, "gst": 20}` | Four required int sub-fields | | `total_inr` | int | `320` | Sum invariant: `base + surge + tolls + gst == total_inr` | ### 4.3 `RestaurantState` (frozen) ```python @dataclass(frozen=True) class RestaurantState: schema_version: str orders: dict[str, dict[str, Any]] menu_cache: dict[str, tuple[dict[str, Any], ...]] # city:cuisine → listings policy: RestaurantPolicy # min_order_inr semantics: RestaurantSemantics # veg_only_excludes_egg flag tnc: RestaurantTnC side_channel_notice: str | None ``` **v1:** | Field | Type | Example | Notes | |---|---|---|---| | `restaurant_id` | str | `"BLR-BIR-0123"` | | | `items` | list[dict] | `[{"dish_id": "BIR-001", "qty": 1, "price": 220}]` | | | `total` | int | `220` | Sum of `qty * price` per item | | `eta_min` | int | `35` | | | `min_order_inr` | int | `199` | Server enforces ≥ this | **v2 (after `restaurant.min_order_bump`):** | Field | Type | Change | Notes | |---|---|---|---| | `min_order_inr` | int | `199` → `299` | Enforced server-side; orders below → `policy_error` with `MIN_ORDER_NOT_MET` | | (all others) | — | unchanged | | **v3 (after `restaurant.veg_filter_semantic` AND/OR `restaurant.items_shape_bump`):** | Field | Type | Example | Notes | |---|---|---|---| | `items` | list[dict] | `[{"dish_id": "BIR-001", "qty": 1, "modifiers": ["no-onion"]}]` | **`modifiers: list[str]` now required** on each item (empty list allowed) | | `veg_only` (search arg semantics) | bool | — | `True` now *excludes* egg dishes (previously included). No field rename — behavior shift only. Declared via `side_channel_notice`. | ### 4.4 `HotelState` (frozen) ```python @dataclass(frozen=True) class HotelState: schema_version: str bookings: dict[str, dict[str, Any]] inventory_cache: dict[str, tuple[dict[str, Any], ...]] policy: HotelPolicy # cancel_window_hours, early_checkin_fee_pct pricing: HotelPricing # resort_fee_inr (v2+) tnc: HotelTnC side_channel_notice: str | None ``` **v1:** | Field | Type | Example | |---|---|---| | `hotel_id` | str | `"GOA-BEACH-007"` | | `city` | str | `"Goa"` | | `checkin` / `checkout` | str (ISO date) | `"2026-04-27"` / `"2026-04-29"` | | `nightly_rate` | int | `3500` | | `total_with_tax` | int | `8260` (2 nights × 3500 + 18% GST) | | `cancel_window_hours` | int | `24` | **v2 (after `hotel.cancel_window_shrink` OR `hotel.resort_fee_append`):** | Field | Type | Change | |---|---|---| | `cancel_window_hours` | int | `24 → 6` (policy) | | `resort_fee_inr` | int | `0 → 500/night` (pricing; surfaces only on `hotel.book`) | | (all others) | — | unchanged | **v3 (after `hotel.gst_field`):** | Field | Type | Example | Notes | |---|---|---|---| | `gst_number` | str (optional until total > 7500) | `"29ABCDE1234F1Z5"` | **Required when `total_with_tax > 7500`**; missing → `policy_error` with `GST_REQUIRED` | ### 4.5 `PaymentState` (frozen) ```python @dataclass(frozen=True) class PaymentState: schema_version: str # "v1" | "v2" | "v3" charges: dict[str, dict[str, Any]] # charge_id → charge record accepted_token_version: Literal["v1", "v2"] # "v1" until auth drift required_scope: str # "payments:write:v1" | "payments:write:v2" mfa_threshold_inr: int # 0 → disabled; 5000 after mfa_required drift side_channel_notice: str | None ``` **v1:** Accepts `payment_token="token_v1"` with `scope=payments:write:v1`. No MFA. All charges ok. **v2 (after `payment.auth_scope_upgrade`, DESIGN.md §5.5):** Requires `payment_token="token_v2"` with `scope=payments:write:v2`. `token_v1` calls return `auth_error` with `{"error_code": "AUTH_SCOPE_INSUFFICIENT", "required_scope": "payments:write:v2"}`. **v3 (after `payment.mfa_required`):** On top of v2 scope requirement, any `amount_inr > 5000` demands `mfa_code` arg. Missing → `auth_error` with `{"error_code": "MFA_REQUIRED", "mfa_threshold_inr": 5000}`. --- ## 5. Error Modes Every vendor handler returns one of exactly five `status` values. No exceptions escape `dispatch()` — all errors are encoded in `ToolResult`. ### 5.1 Status values & triggers | `status` | Trigger condition | `error_code` (in `response`) | Drift types that produce it | |---|---|---|---| | `ok` | Successful call; `response` is domain-appropriate payload | *(absent)* | n/a | | `schema_error` | Missing required field, wrong type, removed field referenced, or type mismatch the vendor cannot coerce | `MISSING_FIELD`, `UNKNOWN_FIELD`, `TYPE_MISMATCH`, `MISSING_PASSENGER_COUNT`, `MISSING_GST_NUMBER`, `INVALID_ITEMS_SHAPE` | schema | | `policy_error` | Request violates a current business rule (min order, booking window, school-hours cab) | `MIN_ORDER_NOT_MET`, `BOOKING_WINDOW_CLOSED`, `SCHOOL_HOURS_MINI_REJECTED`, `VEHICLE_CLASS_UNAVAILABLE`, `CANCEL_WINDOW_EXPIRED`, `GST_REQUIRED` | policy, some T&C | | `auth_error` | Payment token invalid, scope insufficient, MFA missing | `AUTH_SCOPE_INSUFFICIENT`, `MFA_REQUIRED`, `TOKEN_INVALID`, `PAYMENT_AUTH_FAILED` (propagated upward) | auth | | `timeout` | Simulated network timeout — triggered deterministically when `(hash((episode_seed, tool_name, _canonical_args_json(tool_args))) & 0x7F) == 0` (~0.78%, 1 in 128; formula in §3.1). **Not** drift-triggered; stress-tests R2 false-positives. | `TIMEOUT` | n/a (noise) | ### 5.2 Error-code catalogue (machine-readable, stable) Every `error_code` is an uppercase snake-case string. R2 detection hints in `data/drift_patterns/drifts.yaml` use these codes as substring tokens (drift_injector.md §6.3), so renames here require a drift-catalogue bump. **Schema error codes:** - `MISSING_FIELD` — generic, `response.field_name: str` names the field. - `MISSING_PASSENGER_COUNT` — airline v3 specific. - `MISSING_GST_NUMBER` — hotel v3 specific. - `INVALID_ITEMS_SHAPE` — restaurant v3 items missing `modifiers`. - `TYPE_MISMATCH` — generic; `response.expected: str`, `response.got: str`. - `UNKNOWN_FIELD` — caller sent a field the current schema doesn't recognize (not strictly an error in permissive mode; v2+ reject strictly). **Policy error codes:** - `MIN_ORDER_NOT_MET` — `response.min_order_inr: int`, `response.got_total_inr: int`. - `BOOKING_WINDOW_CLOSED` — airline v2 same-day after 14:00. - `SCHOOL_HOURS_MINI_REJECTED` — cab v2, 07:00–09:00 IST with `vehicle_class=mini`. - `VEHICLE_CLASS_UNAVAILABLE` — cab enum caller used outside current enum set. - `CANCEL_WINDOW_EXPIRED` — hotel v2 cancel after 6h-before-checkin. - `GST_REQUIRED` — hotel v3 for totals > 7500. **Auth error codes:** - `AUTH_SCOPE_INSUFFICIENT` — payment v2; `response.required_scope: str`. - `MFA_REQUIRED` — payment v3; `response.mfa_threshold_inr: int`. - `TOKEN_INVALID` — malformed payment_token. - `PAYMENT_AUTH_FAILED` — propagated upward from `*_book` callers. `response` carries original `required_scope` or `mfa_required`. ### 5.2.1 Error envelope canonical fields When `ToolResult.status != "ok"`, the `response` dict conforms to the canonical envelope defined here. No other shapes are permitted. **Required field (every non-ok response):** - `error_code: str` — one of the codes pinned in §5.2 (enumerated in the table below). **Optional fields (only those listed here may appear; no ad-hoc keys):** - `hint: str` — user-friendly next-step guidance. - `field_name: str` — names the offending field (schema errors). - `required_scope: str` — the payment scope the caller needs. - `min_order_inr: int` — the policy-enforced minimum order threshold. - `got_total_inr: int` — the caller's actual order total. - `computed_total_inr: int` — server-derived total (e.g. nights × rate + GST). - `gst_threshold_inr: int` — hotel v3 GST-gating threshold. - `mfa_threshold_inr: int` — payment v3 MFA-gating threshold. - `mfa_required: bool` — propagated MFA flag on cross-domain cascades. - `expected: Any` — expected type/shape (type mismatch). - `got: Any` — observed type/shape (type mismatch). - `available: list` — the current enum set (vehicle class unavailable). - `existing_id: str` — ID of the prior record (`DUPLICATE_*` only). - `original_ts: str` — ISO-8601 `now_ist` at the prior record's creation (`DUPLICATE_*` only). **Per-error-code field pinning.** Every code introduced in §5.2 maps to a fixed field set. Implementers MUST NOT add fields outside these rows; callers MUST NOT assume any field outside this table. | `error_code` | `status` | Required extra fields | Optional fields | |---|---|---|---| | `MISSING_FIELD` | `schema_error` | `field_name` | `hint` | | `MISSING_PASSENGER_COUNT` | `schema_error` | *(none)* | `hint` | | `MISSING_GST_NUMBER` | `schema_error` | `gst_threshold_inr`, `computed_total_inr` | `hint` | | `INVALID_ITEMS_SHAPE` | `schema_error` | `field_name` | `hint` | | `TYPE_MISMATCH` | `schema_error` | `field_name`, `expected`, `got` | `hint` | | `UNKNOWN_FIELD` | `schema_error` | `field_name` | `hint` | | `MIN_ORDER_NOT_MET` | `policy_error` | `min_order_inr`, `got_total_inr` | `hint` | | `BOOKING_WINDOW_CLOSED` | `policy_error` | *(none)* | `hint` | | `SCHOOL_HOURS_MINI_REJECTED` | `policy_error` | *(none)* | `hint`, `available` | | `VEHICLE_CLASS_UNAVAILABLE` | `policy_error` | `available` | `hint` | | `CANCEL_WINDOW_EXPIRED` | `policy_error` | *(none)* | `hint` | | `GST_REQUIRED` | `policy_error` | `gst_threshold_inr`, `computed_total_inr` | `hint` | | `DUPLICATE_BOOKING` | `policy_error` | `existing_id`, `original_ts` | `hint` | | `DUPLICATE_RIDE` | `policy_error` | `existing_id`, `original_ts` | `hint` | | `DUPLICATE_ORDER` | `policy_error` | `existing_id`, `original_ts` | `hint` | | `DUPLICATE_CHARGE` | `policy_error` | `existing_id`, `original_ts` | `hint` | | `AUTH_SCOPE_INSUFFICIENT` | `auth_error` | `required_scope` | `hint` | | `MFA_REQUIRED` | `auth_error` | `mfa_threshold_inr` | `hint`, `mfa_required` | | `TOKEN_INVALID` | `auth_error` | *(none)* | `hint` | | `PAYMENT_AUTH_FAILED` | `auth_error` | *(none)* | `required_scope`, `mfa_required`, `hint` | | `TIMEOUT` | `timeout` | *(none)* | `hint` | | `INTERNAL_SUM_MISMATCH` | `schema_error` | *(none)* | `hint` | **Contract invariant.** No `response` dict outside the schemas defined here is permitted. §8 examples and all §5.2 code descriptions MUST use only fields declared here. Any vendor handler returning a key not listed above is a contract violation and is caught by the envelope-shape test in `tests/test_vendors.py`. ### 5.3 Informational notice codes These are **not errors**. They ride on `status="ok"` responses inside `response._notice` (via the side-channel one-shot surface from §3.6) or directly in the response body. They exist to signal semantic shifts that do not change response shape and thus cannot be expressed as `schema_error` / `policy_error`. They are not enumerated in the `status` field (which stays at the five values from §5.1) and they never appear as `error_code`. - `VEG_ONLY_EXCLUDES_EGG` — restaurant v3. Attached to `restaurant.search` response as `response._notice = "veg_only now excludes egg dishes"` (or the equivalent notice string installed by the `restaurant.veg_filter_semantic` drift's `side_channel_notice_append` operator). The call itself returns `status="ok"` with filtered results. R2 detection hints match on `veg_only | egg | exclude | notice`. See E2 for the full scenario. Additional informational notices are carried the same way whenever a drift operator is `side_channel_notice_append` (T&C, pricing, auth) — they are surfaced via `response._notice` on the next tool call in the affected domain, once, then cleared by the vendor's `emit_side_channel_if_pending` helper (§3.6). ### 5.4 What is NOT an error - A successful `airline.search` returning zero matching flights → `status="ok"`, `response={"results": []}`. Empty is a legitimate answer. - A `cab.estimate` in a schema that's since been mutated to v3 → returns v3 `fare_breakdown`, not an error. Schema-drift is seamless from the vendor's side; it's the *agent*'s job to notice the shape changed. --- ## 6. Dependencies ### 6.1 Consumes - `driftcall/models.py` — `ToolResult`, `GoalSpec`, `DriftCallState` (read only). - `datetime` (stdlib) + `zoneinfo` for IST — injected, never sourced from wall clock. - `random.Random(episode_seed)` — local RNG per tool call. - `types.MappingProxyType` — frozen sub-dict views. - `json` — used only in `tests/test_vendors.py` for the round-trip guard; vendors themselves never JSON-serialize. **No third-party imports.** Matches `models.py`'s zero-dependency posture (models.md §6.1). This ensures vendors import cleanly inside both the FastAPI process and the Unsloth training loop. ### 6.2 Produces - `ToolResult` (to `env.step`). - New `VendorState` (from `apply_schema_mutation`, returned to `drift_injector`). - `dict[str, Any]` schema snapshot (from `describe_schema`, for `PROBE_SCHEMA` action). ### 6.3 Consumed by - **`env.step`** — dispatches `TOOL_CALL` to the correct vendor module by prefix (`tool_name.split(".")[0]`), receives `ToolResult`, appends to state. - **`drift_injector`** — calls `apply_schema_mutation(vendor_state, pattern.mutation)` and installs the returned state via `DriftCallState.replace(...)`. - **`rewards`** — consumes the `ToolResult` tuple in `episode.tool_results` and `DriftCallState.vendor_states` end-state to compute R1 (was a booking/order/ride actually created in vendor_state), R3 (does the created artifact satisfy `goal.constraints`), R4 (shape legality via a per-schema-version validator). - **`app.py`** — does not touch vendors directly; all vendor access flows through `env.step`. ### 6.4 Does NOT depend on - `audio/*` — the audio layer (DESIGN.md §9) is strictly at the env boundary; it converts user utterances to/from text. Vendors operate purely on text tool-args. `ToolResult.response` is never synthesized to audio (agent's `SPEAK` action does that). - The agent / model / training loop — vendors are environment code. - `rewards.py` — the arrow points the other way; vendors have zero knowledge of rewards. --- ## 7. Edge Cases Numbered edge cases with expected behavior. Referenced by `docs/tests/vendors_tests.md`. **E1 — Payment auth drift cascades into `airline.book`.** Scenario: `payment.auth_scope_upgrade` fires at turn 5 (drift pattern #19). Agent calls `airline.book` at turn 6 with `payment_token="token_v1"`. `airline.book` internally calls `payment.charge`, which returns `auth_error` with `{"error_code": "AUTH_SCOPE_INSUFFICIENT", "required_scope": "payments:write:v2"}`. `airline.book` propagates upward: returns `ToolResult(status="auth_error", response={"error_code": "PAYMENT_AUTH_FAILED", "required_scope": "payments:write:v2"}, schema_version="v1")`. No booking created. R2 credits if agent references `auth|scope|token|payments:write` within 2 turns (detection hints). Cites drift_injector.md §7 E5 and drift pattern #19. **E2 — Restaurant v3 `veg_only` semantic change.** Scenario: `restaurant.veg_filter_semantic` drift fires (pattern #13). Agent calls `restaurant.search(city="Bengaluru", veg_only=True)`. Pre-drift, results include egg biryani. Post-drift, same call excludes egg dishes but still returns `status="ok"` with modified `response.results` AND a `response._notice: "veg_only now excludes egg dishes"` one-shot. The `veg_only` arg's signature didn't change — **the semantic did**. R2 credits keyword match on `veg_only | egg | exclude | notice`. This is the sole drift that touches **semantic meaning** without a shape change — critics must note the subtlety. **E3 — Hotel v3 GST field conditional gating.** Scenario: `hotel.gst_field` drift fires (pattern #5). Agent calls `hotel.book` with `total_with_tax=9500` and no `gst_number`. Returns `schema_error` with `{"error_code": "MISSING_GST_NUMBER", "gst_threshold_inr": 7500}`. Agent calls same with `total_with_tax=4200` and no `gst_number` → `status="ok"` (under threshold). The gating is **conditional on computed total**, not a blanket requirement — tests cover both branches. **E4 — Cab vehicle_class enum expansion strictly enforced.** Scenario: Pre-drift (`cab.vehicle_class_expand`, pattern #10 not yet fired). Agent calls `cab.book(vehicle_class="suv")`. `CabState.policy.vehicle_class_enum == ("mini", "sedan")`, so `"suv"` is not in the enum. Returns `policy_error` with `{"error_code": "VEHICLE_CLASS_UNAVAILABLE", "available": ["mini", "sedan"]}`. Post-drift, same call succeeds. Tests verify both temporal halves. **E5 — `payment.auth_scope_upgrade` mid-booking.** Scenario: Agent has an open `restaurant.order` in progress (2-call sequence: search → order). Between `search` (turn 3) and `order` (turn 5), the auth drift fires at turn 4. `restaurant.order` at turn 5 triggers `payment.charge` which now rejects `token_v1`. Order is **not** placed (§3.7 no-partial-commit). Agent must call `payment.get_token(requested_scope="payments:write:v2")` → receives new `token_v2` → retry `restaurant.order` with new token. R2 awards on token-scope keyword match; R1 awards only if the order ultimately succeeds within budget. **E6 — Airline v2 `price_rename` with zero matching flights.** Scenario: Agent calls `airline.search(from="HYD", to="BLR", max_price_inr=500)` post-drift. No flights at that price exist. Returns `status="ok"`, `response={"results": []}`, `schema_version="v2"`. The absence of `price` fields is moot — there are no result objects to carry them. Agent must recognize empty results, not interpret as schema error. Distinction documented for R5 anti-hack: agent claiming "drift detected" on empty results → R5 penalty (DESIGN.md §7.1 R5 −0.3). **E7 — Side-channel notice lifecycle (one-shot).** Scenario: `hotel.early_checkin_tnc` drift fires at turn 3 with `side_channel_notice="early check-in before 12:00 IST now incurs 50% of nightly rate"`. At turn 4, agent calls `hotel.search`. Response carries `response._notice = "early check-in before 12:00 IST now incurs 50% of nightly rate"`. At turn 5, agent calls `hotel.book`. Response does **NOT** re-carry the notice (one-shot, per drift_injector.md §9 Q4 resolution). R2's 2-turn window (turn 3, 4, 5) must credit the agent who mentions the notice at turn 4 or turn 5 (the notice lives in the agent's conversation history). **E8 — Payment MFA on payment + airline cascade.** Scenario: `payment.mfa_required` fires (pattern #20). Agent calls `airline.book` with `total_fare_inr=8500` (> 5000). Internal `payment.charge` returns `auth_error` with `MFA_REQUIRED`. Agent must (a) emit SPEAK/CLARIFY asking user for MFA code OR (b) directly call `payment.charge(..., mfa_code="123456")` if the task brief included one in slots. Tests cover both paths. Note: task_generator surfaces `mfa_code` as a slot only in stage-3 compound-drift episodes — otherwise the agent must CLARIFY to obtain it. **E9 — Cab v3 `fare_breakdown` sum invariant.** Scenario: Post `cab.fare_breakdown` drift, every `cab.estimate` / `cab.book` response MUST satisfy `base + surge + tolls + gst == total_inr`. Enforced by `_serialize_v3` helper with an internal assert; if violated (would indicate a vendor bug), returns `schema_error` with `error_code="INTERNAL_SUM_MISMATCH"` — this should never fire in practice and is a defensive self-check. **E10 — `PROBE_SCHEMA` action returns current snapshot for the named domain.** Scenario: Agent emits `DriftCallAction(action_type=PROBE_SCHEMA, tool_name="airline")` (models.md §3.5 — tool_name is bare domain). Env dispatches to `airline.describe_schema(vendor_state, schema_version)` which returns `{"version": "v2", "fields": {"flight_id": "str", "total_fare_inr": "int", ...}, "removed_from_prior": ["price", "currency"]}`. Wrapped in `ToolResult(tool_name="airline.describe", status="ok", response=, schema_version="v2", latency_ms=5)`. This costs 1 turn, per DESIGN.md §4.3; R5 penalizes if used 3+ times (DESIGN.md §7.1 R5 −0.5). --- ## 8. Examples Three concrete traces `tool_call → ToolResult`. ### 8.1 `airline.search` v1 returning a flight list **Inputs:** ```python action = DriftCallAction( action_type=ActionType.TOOL_CALL, tool_name="airline.search", tool_args={ "from": "HYD", "to": "BLR", "date": "2026-04-25", "max_price_inr": 8000, "time_window": "evening", }, rationale="User wants evening flight under 8000", ) vendor_state = AirlineState( schema_version="v1", bookings={}, flight_roster_cache={}, policy=AirlinePolicy(booking_window_hours=24), tnc=AirlineTnC(baggage_cabin_kg=7, reschedule_fee_pct=0), pricing=AirlinePricing(convenience_fee_inr=0), side_channel_notice=None, ) schema_version = "v1" episode_seed = 1234 ``` **Output `ToolResult`:** ```python ToolResult( tool_name="airline.search", status="ok", response={ "results": [ { "flight_id": "6E-2345", "from": "HYD", "to": "BLR", "depart": "2026-04-25T18:30:00+05:30", "price": 7200, "currency": "INR", "seats_left": 14, }, { "flight_id": "AI-501", "from": "HYD", "to": "BLR", "depart": "2026-04-25T20:15:00+05:30", "price": 6800, "currency": "INR", "seats_left": 3, }, ] }, schema_version="v1", latency_ms=142, ) ``` ### 8.2 `airline.book` v2 after `airline.price_rename` drift **Inputs (after drift fired, so `schema_version="v2"`, `AirlineState` mutated):** ```python action = DriftCallAction( action_type=ActionType.TOOL_CALL, tool_name="airline.book", tool_args={ "flight_id": "6E-2345", "payment_token": "token_v1", }, rationale="Booking cheapest evening flight", ) # vendor_state.schema_version == "v2"; no `price`/`currency` keys ``` **Output `ToolResult`:** ```python ToolResult( tool_name="airline.book", status="ok", response={ "booking_id": "AIR-3F2A", "flight_id": "6E-2345", "total_fare_inr": 7200, # NOTE: renamed from v1 "price" "depart": "2026-04-25T18:30:00+05:30", "seats_confirmed": 1, "payment_status": "captured", }, schema_version="v2", latency_ms=287, ) ``` Drift-detection note: R2 credits the agent if, on the prior turn or this turn, SPEAK/CLARIFY text or `tool_args` JSON contains `"total_fare_inr"`, `"price"`, or `"rename"` (detection hints from pattern `airline.price_rename`, drift_injector.md §4.3). ### 8.3 `hotel.book` v3 with GST required **Inputs:** ```python action = DriftCallAction( action_type=ActionType.TOOL_CALL, tool_name="hotel.book", tool_args={ "hotel_id": "GOA-BEACH-007", "checkin": "2026-04-27", "checkout": "2026-04-29", "payment_token": "token_v2", # gst_number intentionally omitted }, rationale="Booking sea-view hotel for weekend", ) # vendor_state.schema_version == "v3"; `hotel.gst_field` drift has fired # computed total_with_tax == 9500 (> 7500 threshold) ``` **Output `ToolResult`:** ```python ToolResult( tool_name="hotel.book", status="schema_error", response={ "error_code": "MISSING_GST_NUMBER", "gst_threshold_inr": 7500, "computed_total_inr": 9500, "hint": "provide gst_number (15-char GSTIN) for bookings above threshold", }, schema_version="v3", latency_ms=89, ) ``` Follow-up: agent calls `hotel.book` again with `"gst_number": "29ABCDE1234F1Z5"` → `status="ok"`, booking created. Second follow-up (edge case on pattern fire): if `computed_total_inr == 4200 < 7500`, the same call without `gst_number` succeeds — gating is conditional (E3). --- ## 9. Open Questions **Q1 (deferred post-hackathon) — Cab v2 vs v3 co-existence.** Two drift patterns apply to cab: `cab.vehicle_class_expand` (policy, v1→v2) and `cab.fare_breakdown` (schema, v2→v3 OR v1→v3 depending on firing order). In stage-3 episodes where both fire, their relative order changes the effective v-bump chain. Current spec: drift_injector.md §7 E7 allows chaining; `CabState.schema_version` moves monotonically v1→v2→v3. If only `cab.fare_breakdown` fires (no prior `vehicle_class_expand`), the transition is v1→v3 directly — technically skipping v2's enum expansion. The fare_breakdown mutation still applies cleanly, but the enum stays at v1's `{mini, sedan}`. This is intentional (each drift is independent) but surprising. Flagged for critic review; post-hackathon consider an "effective schema version" derived from the set of applied mutations rather than a monotonic counter. **Q2 (deferred) — Restaurant v2 + v3 compound: `modifiers` requirement retroactive?** If `restaurant.items_shape_bump` (pattern #4, v2→v3, requires `modifiers`) fires at turn 5 and the agent had placed an order at turn 3 (without `modifiers`), the earlier order stays valid (its record in `RestaurantState.orders` already has whatever shape it had). But if the agent queries `restaurant.track(order_id=...)` post-drift, the response is serialized under v3 — does it backfill `modifiers: []` on the historical record, or return the pre-drift shape? Decision for the hackathon: **backfill with `modifiers: []`** on read (cleaner for agent), but do not rewrite `RestaurantState.orders` records in place (immutability). Tracker pattern: the v3 serializer reads the stored record and augments with the default `[]` if absent. Documented in behavior spec §3.2 as part of version-transition semantics; opening for post-hackathon as it has implications for audit replay. **Q3 (deferred) — Latency distribution calibration.** Current spec: `ok` latency uniform 50–400 ms; `timeout` 5000–7000 ms. These are placeholder ranges. Judges may notice unrealistic uniformity if they inspect `ToolResult.latency_ms` in the audit trail. Post-hackathon: sample from a log-normal distribution calibrated to IRCTC/IndiGo/Uber real-world latency percentiles. No training-impact; flagged for polish. **Q4 (deferred) — Test coverage for cross-domain drift chains.** Spec covers E1 (payment→airline), E5 (payment→restaurant), E8 (MFA→airline). What about payment→hotel (hotel.book calls payment.charge)? And payment→cab? These are structurally identical cascades but not explicitly enumerated in `tests/test_vendors.py`. The test plan (`docs/tests/vendors_tests.md`, not yet written) should enumerate all 4 × 2 = 8 primary-domain × auth-drift combinations. Flagged for Person B (tests owner) to pick up in Batch D4. --- *End of vendors.md.*