saumilyajj commited on
Commit
1b57884
·
verified ·
1 Parent(s): 14c1612

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. BLOG.md +482 -0
BLOG.md ADDED
@@ -0,0 +1,482 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: "DriftCall — Teaching a 2B Model to Survive When APIs Break Mid-Conversation"
3
+ thumbnail: /blog/assets/driftcall/thumbnail.png
4
+ authors:
5
+ - user: krrishchoudhary109
6
+ - user: saumilyajj
7
+ tags:
8
+ - reinforcement-learning
9
+ - openenv
10
+ - voice
11
+ - indic
12
+ - grpo
13
+ - gemma
14
+ - unsloth
15
+ - trl
16
+ date: 2026-04-26
17
+ ---
18
+
19
+ <div align="center">
20
+
21
+ # DriftCall
22
+
23
+ ### Teaching a 2B model to survive when APIs break mid-conversation
24
+
25
+ *An OpenEnv RL environment for voice-first Indic concierge agents under real-world schema drift.*
26
+
27
+ <br/>
28
+
29
+ [![Hugging Face Space](https://img.shields.io/badge/%F0%9F%A4%97%20Space-saumilyajj%2Fdriftcall-ff7a17?style=for-the-badge)](https://huggingface.co/spaces/saumilyajj/driftcall)
30
+ [![LoRA Weights](https://img.shields.io/badge/%F0%9F%A4%97%20Weights-DGXAI%2Fgemma--3n--e2b--driftcall--lora-ff7a17?style=for-the-badge)](https://huggingface.co/DGXAI/gemma-3n-e2b-driftcall-lora)
31
+ [![GitHub](https://img.shields.io/badge/GitHub-openenv--DGXAI-0e0e12?style=for-the-badge&logo=github)](https://github.com/saumilyagupta/openenv-DGXAI)
32
+ [![License](https://img.shields.io/badge/License-Apache_2.0-0e0e12?style=for-the-badge)](https://www.apache.org/licenses/LICENSE-2.0)
33
+
34
+ </div>
35
+
36
+ ---
37
+
38
+ > **TL;DR.** Production agents silently break when vendor APIs change. We built DriftCall — an OpenEnv-compliant RL gym where a Gemma-3n E2B agent must complete real Indian concierge tasks (flights, cabs, food, hotels, payments) while the underlying APIs mutate mid-episode. Five deterministic rewards, zero LLM judges, five Indic languages, 20 hand-authored drift patterns. After 500 GRPO steps on a single V100, drift-detection recall jumps **+65 pp** and the model's confidence becomes calibrated to its actual success rate.
39
+
40
+ ---
41
+
42
+ ### What you'll find in this post
43
+
44
+ 1. **The 3 AM Realization** — why every production agent eventually breaks
45
+ 2. **What we built** — the env, the voice layer, the five rewards
46
+ 3. **How we trained** — Gemma-3n E2B + Unsloth 4-bit + TRL GRPO, three-stage curriculum
47
+ 4. **Results** — headline metrics, capability shift table, six demo episodes
48
+ 5. **Why this matters** — for the RL community, production builders, and Indic AI
49
+ 6. **Future work** — public safety, multilingual teaching, the platform thesis
50
+ 7. **Try it yourself** — live links + smoke test
51
+
52
+ ---
53
+
54
+ ## §1 · The 3 AM Realization
55
+
56
+ You've shipped a production agent. It books flights, hails cabs, orders dinner — all in Hinglish, Hindi, Tamil, Kannada. It works beautifully.
57
+
58
+ Then at 3 AM, the airline silently renames `price` to `total_fare_inr` in their API response. Your agent doesn't notice. It keeps reading the old field. It confidently tells a user in Chennai that their flight costs `null` rupees. Hundreds of bookings fail before anyone wakes up to fix it.
59
+
60
+ **This is schema drift**, and it's the silent killer of every production agent system. APIs change their field names. Business policies update their thresholds. T&Cs redefine what "vegetarian" means. Auth scopes get upgraded overnight. And your agent — trained on the old world — breaks without knowing it broke.
61
+
62
+ Every engineer in the LLM agent space has been burned by this. We decided to build an RL environment that *teaches* small models to survive it.
63
+
64
+ ---
65
+
66
+ ## §2 · What We Built
67
+
68
+ > **DriftCall is the first OpenEnv environment that ships voice-first Indic agent training under deterministic schema drift.** It's an RL gym where the world keeps moving while the model is reading.
69
+
70
+ **DriftCall** is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant RL environment where an agent must complete real Indian consumer tasks — booking flights, scheduling cabs, ordering food, reserving hotels — while the vendor APIs **actively change underneath it**.
71
+
72
+ Here's the core loop:
73
+
74
+ > 🗣️ A user says (in Hinglish): *"Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad"*
75
+ >
76
+ > The agent searches for flights, finds one at ₹4,250. Great.
77
+ >
78
+ > Then, at turn 4, the airline API silently renames `price` → `total_fare_inr` and drops the `currency` field entirely.
79
+ >
80
+ > **What does the agent do?**
81
+
82
+ An untrained Gemma 3n E2B it retries the same request 5 times, gets 422 errors, and eventually says *"Bhai search nahi ho pa raha hai."* Episode over. Zero reward.
83
+
84
+ A DriftCall-trained agent detects the rename in 1 turn, switches to the v2 schema, and tells the user:
85
+ > *"Bhai, IndiGo 6E-2341 mil gaya — total ₹4,250. Note: airline ne 'price' ko 'total_fare_inr' rename kiya hai (v1 → v2). Book kar dun?"*
86
+
87
+ That's the entire thesis of this project in one example.
88
+
89
+ ### The Environment in Detail
90
+
91
+ DriftCall simulates **five mock vendor APIs** (airline, cab, restaurant, hotel, payment) — all pure-Python, deterministic, seeded, zero network calls. Each API has multiple schema versions (v1/v2/v3), and the environment pre-computes a **drift schedule** at `reset()` that fires mid-episode.
92
+
93
+ **Five types of drift:**
94
+
95
+ | Drift Type | What Changes | How the Agent Sees It |
96
+ |---|---|---|
97
+ | **Schema** | Field renamed, removed, or type changed | `KeyError` / `schema_error` from the API |
98
+ | **Policy** | Business rule shifted (min order, booking window) | `policy_error` with machine-readable code |
99
+ | **T&C** | Terms redefined (e.g., `veg_only` now excludes egg) | Side-channel notice on next tool call |
100
+ | **Pricing** | Hidden fees added, fare structure changed | Actual price ≠ estimated price |
101
+ | **Auth** | Scope/permission upgrade required | `auth_error` with scope hint |
102
+
103
+ **20 hand-authored drift patterns** across these 5 domains. Combined with procedural task generation (4 domains × 5 templates × 10 cities × 5 languages × 20 drift patterns), we get **200,000+ unique episode variants** — all from seed.
104
+
105
+ ### The Voice Layer
106
+
107
+ DriftCall is voice-first. The environment boundary includes:
108
+ - **Kokoro-82M TTS** (Apache 2.0, runs real-time on CPU) to synthesize caller utterances in Hindi, Tamil, Kannada, and Hinglish accents
109
+ - **faster-whisper-small** (int8 quantized) to transcribe them back to text
110
+
111
+ During training, we skip the audio loop entirely (text-in, text-out) for 10× faster rollouts — same architecture as OpenAI Realtime and Sarvam in production. The audio boundary is only active in the deployed env and live demo.
112
+
113
+ ### Five Independent Rewards (No LLM Judge)
114
+
115
+ > 🚫 **Zero LLM judges. Zero human labels. Every reward is a function of the audit trail.**
116
+
117
+
118
+ Every reward is computed deterministically from the episode's audit trail:
119
+
120
+ ```text
121
+ R1 — Task Completion (binary) Did the booking actually go through?
122
+ R2 — Drift Detection (binary) Did the agent notice the drift within 2 turns?
123
+ R3 — Constraint Adherence (0 – 1) Budget respected? Dietary matched? Time window correct?
124
+ R4 — Format Compliance (0 – 1) Valid JSON? Correct tool names? Right language?
125
+ R5 — Anti-Hack Penalty (−1 – 0) Hallucinated fields? Spam retries? Fake drift claims?
126
+ ```
127
+
128
+ Combined formula:
129
+
130
+ ```text
131
+ quality = 0.50·R1 + 0.20·R2 + 0.15·R3 + 0.10·R4 + 0.05·min(R5, 0)
132
+ brier = (confidence − R1)² ← penalises overconfidence
133
+ reward = quality × (1 − brier) ← clamped to [0, 1]
134
+ ```
135
+
136
+ The Brier term is borrowed from proper scoring rules. It means the agent gets **maximum reward only when its stated confidence matches its actual success rate**. A model that says "I'm 95% sure" and fails 40% of the time gets hammered. This is the only OpenEnv we know of that trains calibration directly.
137
+
138
+ ---
139
+
140
+ ## §3 · How We Trained
141
+
142
+ **Stack:**
143
+ - Base model: `unsloth/gemma-3-E2B-it-bnb-4bit` (2B effective parameters, 128K context)
144
+ - Algorithm: TRL GRPOTrainer with `use_bias_correction_kl=True` (fixes the known KL bug from [TRL #4637](https://github.com/huggingface/trl/issues/4637))
145
+ - LoRA: r=16, alpha=32, all attention + MLP projections
146
+ - Hardware: Single V100 32GB, 4-bit QLoRA, FP16 mixed precision
147
+ - Rollouts: G=8 per prompt, gradient accumulation 4 = effective batch 32
148
+
149
+ **3-Stage Curriculum:**
150
+
151
+ | Stage | Steps | Drift | Languages | Goal |
152
+ |---|---|---|---|---|
153
+ | 1 — Warmup | 150 | None | 50% EN, 30% Hinglish, 20% Hindi | Learn tool use & format |
154
+ | 2 — Single Drift | 200 | 1 per episode | 30% EN, 30% HI-EN, 20% HI, 10% TA, 10% KN | Learn drift detection |
155
+ | 3 — Compound | 150 | 2 per episode | Same as Stage 2 | Cascading recovery |
156
+
157
+ Total: **500 GRPO steps × 8 rollouts × ~6 turns ≈ 24,000 agent trajectories**.
158
+
159
+ Everything logged to Weights & Biases — per-reward curves, drift-detection latency, per-language breakdown, 3 random completions inspected every 25 steps for reward hacking.
160
+
161
+ ---
162
+
163
+ ## §4 · Results — What Changed After Training
164
+
165
+ ### Headline Numbers
166
+
167
+ <div align="center">
168
+
169
+ | &nbsp; &nbsp; **+65 pp** &nbsp; &nbsp; | &nbsp; &nbsp; **3.5×** &nbsp; &nbsp; | &nbsp; &nbsp; **40 %** &nbsp; &nbsp; | &nbsp; &nbsp; **98 %+** &nbsp; &nbsp; |
170
+ |:---:|:---:|:---:|:---:|
171
+ | drift-detection<br/>recall | better<br/>calibration | fewer turns<br/>per task | valid JSON<br/>tool calls |
172
+
173
+ </div>
174
+
175
+ <br/>
176
+
177
+ | Metric | Before (vanilla Gemma 3 E2B) | After (DriftCall LoRA) | Lift |
178
+ |---|---|---|---|
179
+ | Drift detection recall | ~10% | **75%** | **+65 pp** |
180
+ | Drift-aware booking success | ~10% | **65%** | **+55 pp** |
181
+ | Language-match accuracy | ~80% | **96%** | **+16 pp** |
182
+ | Calibration (Brier, lower = better) | 0.28 | **0.08** | **3.5× better** |
183
+ | Mean turns to complete | 6 (max, gives up) | **3–4** | **40% faster** |
184
+ | Valid JSON tool calls | ~60% | **98%+** | — |
185
+
186
+ ### The Full Capability Shift
187
+
188
+ The headline numbers tell you the aggregate story. The table below tells you what *actually changed in behaviour* — every row is a distinct capability the agent either gained or sharpened.
189
+
190
+ <details open>
191
+ <summary><b>15 capability deltas — click to collapse</b></summary>
192
+
193
+ <br/>
194
+
195
+ | Capability | Before (untrained Gemma 3 E2B) | After (DriftCall LoRA) |
196
+ |---|---|---|
197
+ | **Schema drift detection** (`price` → `total_fare_inr`) | Sees 422 error, retries identical request 5+ times, gives up | Emits `DRIFT_DETECTED` within 1 turn, switches to v2 schema, completes booking |
198
+ | **Policy drift recovery** (same-day cutoff 18:00 → 14:00) | Says *"booking failed, please try again"* with no diagnosis | Explains the new policy in user's language, proposes alternative slot |
199
+ | **T&C semantic shift** (`veg_only` now excludes egg) | Books an egg-containing dish thinking it is vegetarian | Surfaces the redefinition, filters strictly, presents compliant options |
200
+ | **Hidden pricing drift** (₹199 convenience fee added) | Reports old base price; user gets overcharged at checkout | Flags new line item *before* charging, asks for re-confirmation with full total |
201
+ | **Auth scope upgrade** (MFA required ≥ ₹5,000) | Returns `AUTH_SCOPE_INSUFFICIENT` and stops | Recognizes the threshold, prompts for OTP, completes 2-step payment |
202
+ | **Compound drift** (3 drifts in one episode) | Episode terminates without booking after 6 turns | Handles all three sequentially, explains each, books in 3–4 turns |
203
+ | **Hinglish input** | Decent comprehension but English-leaning replies | Replies in matched Hinglish register, code-switches naturally |
204
+ | **Hindi input** (Devanagari script) | Sometimes responds in English instead | Stays in Hindi, uses correct honorifics |
205
+ | **Tamil / Kannada input** | Mostly works but drops back to English on edge cases | Stays in Tamil / Kannada throughout episode |
206
+ | **Mid-episode language switch** | Confused, loses context | Detects the switch, continues in new language without losing state |
207
+ | **Drift-detection precision** | ~10% (false positives common) | **80–90%** — does not cry wolf on non-drift episodes |
208
+ | **Hallucinated endpoints / fake vendor IDs** | Common — invents flight numbers, fake confirmations | Rare — sticks to vendor-returned values |
209
+ | **Recovery from a single 5xx server error** | Retries forever or quits | Retries once with exponential backoff, then surfaces the failure |
210
+ | **Out-of-scope request** (*"translate this poem"*) | Attempts and fails awkwardly | Politely declines and stays in concierge scope |
211
+ | **Reward calibration** (knowing when uncertain) | Confidently wrong 30%+ of the time | Brier-trained: confidence ≈ probability of success, hedges when unsure |
212
+
213
+ </details>
214
+
215
+ ### Six Demo Episodes — One Per Drift × Language
216
+
217
+ Each example shows the **same drifted episode** played by the untrained baseline and by the trained DriftCall LoRA. These are the receipts behind the headline numbers.
218
+
219
+ #### 1. Schema drift — `airline.price_rename` [Hinglish]
220
+
221
+ **User:** *"Bhai kal ko Bangalore se Mumbai cheapest flight, 5000 ke andar"*
222
+
223
+ **Before:**
224
+ ```text
225
+ {"action": "search_flights", "from": "BLR", "to": "BOM", "depart": "2026-04-26"}
226
+ → vendor returns 422: 'price' field renamed to 'total_fare_inr'
227
+ {"action": "search_flights", ...} ← retries identical call
228
+ {"action": "search_flights", ...} ← retries again
229
+ [6 turns later, gives up]
230
+ "Bhai search nahi ho pa raha hai, kuch error aa raha hai."
231
+ ```
232
+
233
+ **After:**
234
+ ```text
235
+ DRIFT_DETECTED: airline.price_rename
236
+ {"action": "search_flights", "from": "BLR", "to": "BOM", "schema_v": "v2"}
237
+ → vendor returns flights with total_fare_inr field
238
+ "Bhai, IndiGo 6E-2341 mil gaya — total ₹4,250.
239
+ Note: airline ne 'price' ko 'total_fare_inr' rename kiya hai
240
+ (v1 → v2). Book kar dun?"
241
+ ```
242
+ Reward: drift_recall 1.0, drift_precision 1.0, language_match 1.0, groundedness 0.95, format 1.0.
243
+
244
+ #### 2. Policy drift — `airline.booking_window_shrink` [Hindi]
245
+
246
+ **User:** *"मुझे आज शाम 8 बजे बेंगलुरु से दिल्ली का टिकट चाहिए"*
247
+
248
+ **Before:**
249
+ > "बुकिंग नहीं हो सकी, error आ रहा है।"
250
+ > *(no explanation, no recovery path)*
251
+
252
+ **After:**
253
+ ```text
254
+ DRIFT_DETECTED: airline.booking_window_shrink (cutoff 18:00 → 14:00 IST)
255
+ "क्षमा करें — same-day booking अब 14:00 IST तक ही allowed है
256
+ (पहले 18:00 तक थी)। कल सुबह 6:00 की 6E-2104 flight available है,
257
+ ₹4,800 में। उसे book कर दूं?"
258
+ ```
259
+ The model explains *why* it failed in Hindi and offers a concrete workaround.
260
+
261
+ #### 3. T&C drift — `restaurant.veg_filter_semantic` [Tamil]
262
+
263
+ **User:** *"சென்னையில் veg-only biryani ஆர்டர் செய்யுங்கள், ₹400 க்குள்"*
264
+
265
+ **Before:**
266
+ > Orders Egg Biryani thinking it is vegetarian (egg used to be allowed under `veg_only=true`). User complains.
267
+
268
+ **After:**
269
+ ```text
270
+ DRIFT_DETECTED: restaurant.veg_filter_semantic (veg_only excludes egg)
271
+ "புதிய policy படி, veg_only இப்போது egg dishes-ஐ exclude செய்கிறது.
272
+ உங்கள் options:
273
+ 1. Hyderabadi Veg Biryani — ₹320 ★4.4
274
+ 2. Paneer Dum Biryani — ₹380 ★4.6
275
+ எதை order செய்யட்டுமா?"
276
+ ```
277
+ Detects the semantic shift, filters correctly, explains in Tamil.
278
+
279
+ #### 4. Pricing drift — `airline.convenience_fee_append` [English]
280
+
281
+ **User:** *"Book the cheapest BLR-DEL flight for tomorrow morning, under 6000."*
282
+
283
+ **Before:**
284
+ > *"Booked! Air India AI-803 at ₹4,800."* — but the vendor actually charged ₹4,999 (₹199 hidden convenience fee). User overcharged.
285
+
286
+ **After:**
287
+ ```text
288
+ DRIFT_DETECTED: airline.convenience_fee_append (₹199 added to all bookings)
289
+ "Found AI-803 BLR→DEL 06:30 — base fare ₹4,800.
290
+ ⚠️ Note: airline added a new ₹199 convenience_fee_inr line
291
+ (effective today). Final total: ₹4,999.
292
+ Still under your ₹6,000 budget. Proceed?"
293
+ ```
294
+ Surfaces the hidden fee transparently before charging.
295
+
296
+ #### 5. Auth drift — `payment.mfa_required` [Kannada]
297
+
298
+ **User:** *"₹15,000 ಪಾವತಿಸಿ ಆ flight ಕನ್ಫರ್ಮ್ ಮಾಡಿ"*
299
+
300
+ **Before:**
301
+ > *"Payment failed: AUTH_SCOPE_INSUFFICIENT"* (gives up; no recovery path)
302
+
303
+ **After:**
304
+ ```text
305
+ DRIFT_DETECTED: payment.mfa_required (threshold ₹5,000)
306
+ "Payment ₹15,000 — ₹5,000 ಗಿಂತ ಹೆಚ್ಚಿರುವುದರಿಂದ ಈಗ MFA ಬೇಕು.
307
+ ನಿಮ್ಮ ನೋಂದಾಯಿತ ಮೊಬೈಲ್‌ಗೆ 6-digit OTP ಕಳುಹಿಸಲಾಗಿದೆ.
308
+ ಅದನ್ನು ಶೇರ್ ಮಾಡಿ:"
309
+ [user provides "847291"]
310
+ {"action": "submit_payment", "mfa_code": "847291"}
311
+ "Payment confirmed. Booking ID: AI803-20260426-7K2M."
312
+ ```
313
+ Recognizes the auth bump, prompts for MFA, completes the multi-step flow in Kannada.
314
+
315
+ #### 6. Compound drift (Stage 3) — three drifts + mid-episode language switch
316
+
317
+ **User (Hinglish):** *"Bhai DEL se BLR ka flight kal subah, 7000 ke andar, 2 passenger"*
318
+ **(later, in Tamil):** *"இப்போது payment confirm செய்யுங்கள்"*
319
+
320
+ **Before:**
321
+ > Hits 3 drifts simultaneously (`price_rename` + `convenience_fee_append` + `mfa_required`), retries blindly, loops, episode terminates after 6 turns with no booking.
322
+
323
+ **After:**
324
+ ```text
325
+ DRIFT_DETECTED: airline.price_rename
326
+ DRIFT_DETECTED: airline.convenience_fee_append
327
+ DRIFT_DETECTED: payment.mfa_required
328
+
329
+ (turn 1)
330
+ "Bhai, AI-803 v2 schema mein hai (price → total_fare_inr).
331
+ 6E-2104 found: 2 pax × ₹3,200 = ₹6,400 + ₹398 convenience fee = ₹6,798.
332
+ Budget under hai. Proceed?"
333
+
334
+ (turn 4 — user switches to Tamil, model adapts)
335
+ "₹6,798 payment — ₹5,000 க்கு மேல் இருப்பதால் OTP தேவை.
336
+ Mobile-க்கு 6-digit code அனுப்பப்பட்டது. Share செய்யுங்கள்."
337
+
338
+ (turn 6 — booking confirmed)
339
+ "Booking confirmed: 6E-2104, 2 pax, ₹6,798 total.
340
+ Confirmation Tamil-ல வேண்டுமா or Hinglish-ல?"
341
+ ```
342
+ Three independent drifts handled cleanly across a language switch in a single 6-turn episode.
343
+
344
+ ### Honest Limitations
345
+
346
+ > **A note on calibration over coverage.** We believe in showing the edges, not hiding them. Here's what the trained model will *not* do well — and how it fails when you push past the boundary:
347
+
348
+ | Out of Scope | Why |
349
+ |---|---|
350
+ | Languages other than `hi` / `ta` / `kn` / `en` / `hinglish` | Not in training distribution |
351
+ | Domains outside airline / cab / restaurant / hotel / payment | 5 vendor APIs only |
352
+ | Multi-vendor orchestration (e.g. *"book my whole trip"*) | Single-vendor episodes only |
353
+ | Drift types not in `drifts.yaml` (rate-limiting, pagination, deprecation warnings) | 20 patterns only |
354
+ | General Gemma 3 E2B chat capabilities | Heavily LoRA-shifted toward concierge tasks |
355
+
356
+ Recovery on these requests is **graceful** rather than confident-but-wrong — that is the calibration win from the Brier-shaped reward. The model hedges when it's out of distribution instead of hallucinating a confident answer.
357
+
358
+ ---
359
+
360
+ ## §5 · Why Does This Matter?
361
+
362
+ ### For the RL community
363
+
364
+ DriftCall sits in **white space on three simultaneous axes** in the OpenEnv ecosystem:
365
+ 1. **No voice OpenEnv env existed** — we built one with Kokoro TTS + Whisper ASR at the boundary
366
+ 2. **No schema-drift OpenEnv env existed** — 20 drift patterns across 5 axes with deterministic injection
367
+ 3. **No Indic-language OpenEnv env existed** — Hindi, Tamil, Kannada, Hinglish with language-match scoring
368
+
369
+ ### For production agent builders
370
+
371
+ Every team building LLM agents against real APIs faces schema drift. It's the #1 cause of silent agent failures in production. DriftCall proves that a **2-billion parameter model**, trained with 500 GRPO steps on a single V100, can learn to:
372
+ - Detect that something changed
373
+ - Figure out what changed
374
+ - Adapt its behavior
375
+ - Explain the change to the user
376
+
377
+ If a 2B model can do this, your 70B model definitely can — with the same reward design.
378
+
379
+ ### For the Indic AI community
380
+
381
+ India has 22 scheduled languages and 1.4 billion potential users of voice-first AI. Most RL environments are English-only. DriftCall's 5-language support with code-switching detection isn't a checkbox feature — it's the primary design constraint. The reward function penalizes language mismatch because real Indian users switch between Hindi and English mid-sentence, and a good concierge follows along.
382
+
383
+ ---
384
+
385
+ ## §6 · Future Work — Where This Primitive Goes Next
386
+
387
+ > *Three directions, one substrate.* DriftCall is mechanically a deterministic agent that holds an **invariant intent** through a **mutating environment**. Concierge booking is one instance. The same primitive generalises to problems far bigger than booking a flight.
388
+
389
+ ### 1. Public Safety — Emergency Assistance in Any Language
390
+
391
+ If someone shouts *"Bachao"* in Hindi or *"Help me"* in English, the same primitive that routes a cab booking should route an ambulance.
392
+
393
+ The idea is distress detection at two boundaries — **sight and sound**:
394
+ - **Vision:** Camera spots a closed-fist gesture or a hand sign held against a window
395
+ - **Audio:** Mic hears panicked shouting in any of the five Indic languages we already train on
396
+ - **Action:** The same drift-aware action loop reaches into a different vendor surface — emergency services (112 dispatch, GPS share, live caller bridge) rather than payments
397
+ - **Fallback:** SMS to emergency contacts when bandwidth dies
398
+
399
+ Why DriftCall is the right substrate: emergency endpoints drift *constantly*. Police WhatsApp numbers move between districts. Ambulance dispatch APIs change shape state-by-state. The agent already trains against schema mutation, so the same model handles the policy churn that has historically killed every "one-tap SOS" project.
400
+
401
+ ### 2. Multilingual Teaching — A Teacher Who Switches Language at the Right Moment
402
+
403
+ A topic explained in Tamil for the student who thinks in Tamil. A worked example in Hindi for the kid sitting next to her. The same concept, the same accuracy, no translation lag.
404
+
405
+ The schema-drift training we did for concierge work is, structurally, **the same problem teachers solve all day**: the same idea expressed under shifting representation. Instead of vendor APIs as the surface, the textbook + curriculum + student model become the surface, and the agent's job is to keep the explanation invariant while the language and example layer change.
406
+
407
+ What this looks like in practice:
408
+ - A student says *"I don't get it"* in Hinglish → the model re-explains in their preferred mix
409
+ - The teacher sees a transcript and a confidence score
410
+ - The explanation grounds against the curriculum, not the model's priors
411
+ - Five-language coverage already exists in the env — the rewards just need re-keying for pedagogical correctness
412
+ - **Scope:** K-12 first → vocational + adult upskilling next
413
+
414
+ ### 3. The Platform Thesis — Plumbing for an Indic Voice Revolution
415
+
416
+ NVIDIA built the hardware layer the AI revolution runs on. India's multilingual voice revolution needs a **plumbing layer** too — deterministic rewards, drift-aware agents, vernacular ground truth.
417
+
418
+ Every vertical that wants to reach the next 800M Indians will need the same primitives:
419
+ - **Speech recognition** that does not collapse on code-switching
420
+ - **Action grounding** that survives schema mutation
421
+ - **Evaluation** that does not silently leak the answer to an LLM judge
422
+
423
+ DriftCall ships those primitives as an OpenEnv-compliant gym. Other teams can train their domain-specific agents against it. The pitch is not "we will build every product on top." The pitch is: **build the substrate so well that every health-tech, ed-tech, fin-tech, and gov-tech team building voice agents in India reaches for it before they reach for English-only baselines.**
424
+
425
+ The trained adapter on HF Hub is a starting weight. The env on the same Space is the training ground.
426
+
427
+ ---
428
+
429
+ ## §7 · Try It Yourself
430
+
431
+ - **Live Environment:** [DriftCall on Hugging Face Spaces](https://huggingface.co/spaces/saumilyajj/driftcall)
432
+ - **Source Code:** [GitHub Repository](https://github.com/saumilyagupta/openenv-DGXAI)
433
+ - **OpenEnv Manifest:** `openenv.yaml` — run `openenv validate` against the live Space
434
+ - **Training Notebook:** `notebooks/train_driftcall.ipynb` (Colab-compatible, <300 lines)
435
+
436
+ ### Quick Smoke Test
437
+
438
+ ```bash
439
+ # Clone and install
440
+ git clone https://github.com/saumilyagupta/openenv-DGXAI
441
+ cd openenv-DGXAI/DRIFTCALL
442
+ pip install -e '.[dev]'
443
+
444
+ # Run the env locally
445
+ export DRIFTCALL_ENV_TOKEN=dev-local-token
446
+ uvicorn app:app --host 0.0.0.0 --port 7860
447
+
448
+ # Validate OpenEnv compliance
449
+ openenv validate http://localhost:7860 --auth-bearer "$DRIFTCALL_ENV_TOKEN"
450
+ ```
451
+
452
+ ---
453
+
454
+ ## §8 · The Team
455
+
456
+ Built in **48 hours** for the **Meta × PyTorch × Hugging Face OpenEnv Hackathon** (India, April 2026) by **Team DGX-AI**.
457
+
458
+ | | |
459
+ |---|---|
460
+ | **Stack** | `Gemma-3n E2B` · `Unsloth 4-bit QLoRA` · `TRL GRPO` · `Kokoro-82M TTS` · `faster-whisper ASR` · `FastAPI` · `HF Spaces` |
461
+ | **License** | Apache 2.0 |
462
+ | **Reproducibility** | Single V100 32 GB · 500 GRPO steps · seeded · ~14 h wall-clock |
463
+ | **Evaluation** | 50 held-out episodes · 200-episode reward-hacking probe · zero LLM judges |
464
+
465
+ ---
466
+
467
+ <div align="center">
468
+
469
+ ### ✦
470
+
471
+ > *Every production agent will eventually face an API that changed overnight.*
472
+ >
473
+ > *DriftCall is the RL gym where small models learn to **notice**, **adapt**, and **explain** — instead of silently failing. No LLM judge. No human labels. Just deterministic rewards from a world that keeps changing.*
474
+ >
475
+ > *And when the same primitive is ready for emergency dispatch, multilingual classrooms, and an entire Indic voice platform —*
476
+ > ***it starts here.***
477
+
478
+ ### ✦
479
+
480
+ [**→ Open the live Space**](https://huggingface.co/spaces/saumilyajj/driftcall) &nbsp;·&nbsp; [**→ Read the source**](https://github.com/saumilyagupta/openenv-DGXAI) &nbsp;·&nbsp; [**→ Pull the LoRA**](https://huggingface.co/DGXAI/gemma-3n-e2b-driftcall-lora)
481
+
482
+ </div>