File size: 3,472 Bytes
f2df60e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
---
title: DriftCall Env
emoji: 🧭
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
short_description: OpenEnv β€” Indic voice concierge under schema drift.
license: apache-2.0
---

# DriftCall

DriftCall is a voice-first Indic consumer-concierge RL environment where
the agent must book flights, schedule rides, and file complaints across
five mock vendor APIs while those APIs undergo deterministic mid-episode
**schema drift**, **pricing drift**, **T&C drift**, **policy drift**, and
**auth drift**. It is an OpenEnv-compliant REST environment plus an
in-process Python trainer; a trained LoRA adapter for Gemma 3n E2B is
published alongside.

- **OpenEnv manifest:** [`openenv.yaml`](./openenv.yaml)
- **Design spec:** [`DESIGN.md`](./DESIGN.md)
- **Phase-C implementation plan:** [`CLAUDE.md`](./CLAUDE.md)
- **Per-module specs:** [`docs/modules/`](./docs/modules)
- **Per-module test plans:** [`docs/tests/`](./docs/tests)

## Architecture at a glance

- **Env Space (this repo):** FastAPI + OpenEnv REST on CPU-basic. Kokoro-82M
  TTS + faster-whisper-small ASR are baked into the image; no outbound
  network at runtime.
- **Trainer:** in-process GRPO (TRL 0.23+, Unsloth 2026.4.5+) on a single
  V100. Text-in / text-out β€” audio is an env-boundary concern.
- **Demo Space:** Gradio 5 on ZeroGPU, base Gemma 3n E2B + trained LoRA
  adapter switchable via a toggle.

## Quickstart

```bash
# 1. Install the dev toolchain.
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

# 2. Run the tests.
python3 -m pytest tests/ -v

# 3. Serve the env locally.
export DRIFTCALL_ENV_TOKEN=dev-local-token
uvicorn app:app --host 0.0.0.0 --port 7860

# 4. Validate against the OpenEnv schema.
openenv validate http://localhost:7860 --auth-bearer "$DRIFTCALL_ENV_TOKEN"
```

## Notebook

`notebooks/train_driftcall.ipynb` is built from the numbered cells under
[`cells/`](./cells). Rebuild with:

```bash
python3 notebooks/build_notebook.py
```

## Weights & Biases (optional)

Training runs auto-log to wandb. Configure via env vars (override priority
highest-to-lowest):

1. **Environment variables** β€” set on the host or in your shell:
   ```bash
   export WANDB_API_KEY=<your-key-from-wandb.ai/authorize>
   export WANDB_PROJECT=driftcall              # default
   export WANDB_ENTITY=<your-team>             # optional
   export WANDB_MODE=online                    # online | offline | disabled
   ```
2. **`cells/_secrets.py` hardcoded fallback** β€” used when env vars are unset.
   Edit the constant in that file to rotate the key (private repo).
3. **None** β€” `init_wandb()` raises at run time if `WANDB_MODE != "disabled"`
   and no API key is reachable.

Disable for local dev / CI:
```bash
export WANDB_MODE=disabled
```

Custom metrics logged each training step (training.md Β§3.3.3):
- `train/beta_adaptive` β€” current KL coefficient (mutated by `AdaptiveKLCallback`)
- `train/kl_measured` β€” measured KL between policy and reference
- `train/kl_target` β€” target KL (default = `BETA_KL` = 0.04)
- `train/beta_clamped_to_min` β€” 1 if Ξ² was floored at `beta_min` this step
- `train/beta_clamped_to_max` β€” 1 if Ξ² was ceilinged at `beta_max` this step

Run tags (set at `wandb.init`): `stage{N}`, `gemma-3n-e2b`, `bf16` or `fp16`,
`adaptive-kl` or `static-kl`, `seed{N}`.

## License

Apache License 2.0. See [`LICENSE`](./LICENSE) (included at repo root when
this artifact is published as a Space).