# `docs/modules/pitch_demo.md` — Pitch & Storytelling Artifacts **Owner:** D (Deploy & Story) · **Batch:** D3-Surface · **Depends on:** `docs/modules/deploy_demo_space.md`, `docs/modules/evaluation.md`, `docs/modules/datasets.md`, `docs/modules/audio.md`, `docs/modules/training.md`, `docs/modules/env.md`, `docs/modules/rewards.md`, `docs/modules/drift_injector.md` · **Cites:** DESIGN.md §15, §1.3, §2.4, §13, §9, §10 --- ## 1. Purpose This module specifies every **storytelling asset** DriftCall needs to convert a strong technical artifact into a winning hackathon submission. It is the sole owner of four deliverables from DESIGN.md §13: 1. **3-minute live pitch** (+ 2-minute Q&A) delivered onsite to the 11-judge panel (DESIGN.md §2.4) on Apr 26, 2026. 2. **Hugging Face blog post** (< 2-minute read) published under the team org, linked from the HF Space cards and the GitHub README. 3. **YouTube video** (< 2 minutes) for async judges, social distribution, and the HF Hub model card's `video` metadata field. 4. **Pitch deck** (≤ 5 slides) used as the live-pitch screen-share backing the verbal script. Together these assets target the **30% Storytelling** and **20% Showing Improvement** criteria from DESIGN.md §1.3, which jointly account for half of judging weight. The environment, training, and reward artifacts score on the remaining 50% (40% Environment Innovation + 10% Reward/Pipeline Quality); those are owned elsewhere. This module's single job is to make those artifacts *legible* to a room with variable attention and variable language coverage. Concretely, this doc: - Locks the **verbatim 3-minute script** from DESIGN.md §15, with explicit B-roll / screen-share cues per beat, so the presenter can rehearse against a stopwatch. - Specifies the **Q&A prep** (8 anticipated questions, 5 from DESIGN.md §15 + 3 new) with pre-written 20-second answers. - Specifies the **blog post outline** (5 sections + audio embeds + code links) at word-count granularity. - Specifies the **YouTube video script** with scene breakdown, B-roll cues, and captioning rules (every Indic clip carries English captions per DESIGN.md §14 risk-12 mitigation). - Specifies the **pitch deck** (5 slides: Hook, Architecture, Curves, Before/After, Close). - Names the **exact files, sizes, durations, file formats, and fallback paths** so that if any component fails live (mic dead, trained checkpoint fails to load, network drops), the pitch still completes on time. The three non-negotiables: tight script (every second justified), English captions on every Indic clip (per risk-12), and a **3-minute hard ceiling** on the live pitch (over-running loses points and forfeits Q&A questions). No placeholders. --- ## 2. Interface This module's "interface" is the set of **artifacts** it publishes, their **consumers**, their **storage locations**, and their **delivery deadlines**. There is no Python module imported by other modules; everything here is markdown, media files, and slide files produced by Person D. ### 2.1 `PitchArtifact` — the bundle contract The complete deliverable set is a single logical bundle, checked into `DRIFTCALL/pitch_assets/` in the repo, mirrored to HF Hub paths, and referenced from the HF Space READMEs. All file names are **locked** — other modules (notably `deploy_demo_space.md §2.1`) reference them by exact path. ``` DRIFTCALL/pitch_assets/ ├── script/ │ ├── pitch_3min.md # §3.1 verbatim script, with B-roll cue blocks │ ├── qa_prep.md # §3.2 eight-question Q&A deck │ └── presenter_cheatsheet.md # one-page stopwatch + backup-path card ├── deck/ │ ├── driftcall_deck.pdf # 5-slide PDF, 1920×1080, exported from Keynote/PPT │ ├── driftcall_deck.pptx # source, editable │ └── slides/ # individual PNGs for the blog + video │ ├── 01_hook.png │ ├── 02_architecture.png │ ├── 03_curves.png │ ├── 04_before_after.png │ └── 05_close.png ├── video/ │ ├── driftcall_demo.mp4 # < 2 min, 1920×1080, H.264, ≤ 50 MB │ ├── driftcall_demo.srt # English captions track (all Indic → English) │ ├── driftcall_demo_script.md # §3.3 scene-by-scene video script │ └── broll/ # raw screen-recordings used in cuts │ ├── base_model_keyerror.mov │ ├── trained_model_adapts.mov │ ├── wandb_curves.mov │ └── gradio_ui_tour.mov ├── blog/ │ ├── post.md # HF blog post, < 2-minute read (~550 words) │ ├── post_assets/ │ │ ├── audio_hindi_brief.wav # mirror of demo/pitch_assets/hindi_brief.wav │ │ ├── audio_trained_reply.wav # post-drift adaptive reply │ │ └── fig_reward_curves.png # from evaluation.md §(curves) │ └── frontmatter.yaml # HF blog frontmatter (author, tags, thumbnail) └── audio_samples/ ├── hindi_brief.wav # canonical 0:00 hook clip, 16 kHz mono ├── hinglish_brief.wav # backup brief (DESIGN.md §16.B line 1) ├── tamil_brief.wav # backup for language-switch demo ├── kannada_brief.wav # backup └── trained_reply_hinglish.wav # the 2:00-2:40 adaptive reply ``` ### 2.2 Consumers (who reads what) | Artifact | Primary consumer | Secondary consumer | When | |---|---|---|---| | `pitch_3min.md` | Presenter (live) | Rehearsal critics | Apr 26 pitch slot | | `qa_prep.md` | Presenter (live) | Any teammate fielding judge questions | Apr 26 pitch + Q&A | | `driftcall_deck.pdf` | 11 judges (screen-share) | HF community (blog embed) | Apr 26 pitch | | `driftcall_demo.mp4` | Async judges, HF social | YouTube viewers, LinkedIn | Published before pitch | | `post.md` | HF community, blog readers | Judges doing pre-read | Published before pitch | | `audio_samples/*.wav` | Pitch deck, blog, video | Backup if live mic fails | Embedded everywhere | ### 2.3 Distribution endpoints | Endpoint | Asset | Path | |---|---|---| | HF Hub blog | `post.md` + `post_assets/` | `https://huggingface.co/blog//driftcall-indic-drift-rl` | | HF Space (demo) README | Embeds `driftcall_demo.mp4` and links deck + blog | `/driftcall-demo` README frontmatter `video` and `links` fields | | HF Space (env) README | Links blog + video | `/driftcall-env` README | | HF Hub model card | Embeds video + blog | `/gemma-3n-e2b-driftcall-lora` model card `video` field | | HF Hub dataset card | Links blog + video | `/driftcall-indic-briefs` dataset card | | YouTube | `driftcall_demo.mp4` | Unlisted initially; made public at pitch start | | GitHub repo `README.md` | Thumbnails + "Watch the pitch" button | Repo root | ### 2.4 Deadlines (IST, onsite Apr 25–26) | Artifact | Deadline | Owner | |---|---|---| | `audio_samples/*.wav` (all 5) | Apr 26 08:00 IST | Person D (uses `audio.md §2.1` TTS offline) | | `driftcall_deck.pdf` v1 | Apr 26 10:00 IST | Person D | | `driftcall_demo.mp4` + `.srt` | Apr 26 12:00 IST | Person D | | `post.md` published to HF blog | Apr 26 13:00 IST | Person D | | Live rehearsal (2 full passes with stopwatch) | Apr 26 14:00 IST | Person D + 1 critic teammate | | `pitch_3min.md` final-lock | Apr 26 15:00 IST | Person D | | Pitch slot | Apr 26, TBD by organizers | Person D presents | --- ## 3. Behavior spec This section is the authoritative **content** of each asset. Bodies are either verbatim (the 3-min script) or structural (outlines with word counts, cue lists, and scene tables). ### 3.1 The 3-minute pitch script (verbatim, from DESIGN.md §15) Total duration: **3:00 hard ceiling**. Every line below is timed against rehearsal. The presenter holds a stopwatch visible only to themselves. If at 2:40 the script has not reached "The Close", cut to the close — partial before/after is better than an over-run. Each beat below specifies: - **Spoken line** — verbatim from DESIGN.md §15. - **Screen-share** — what is on the 1920×1080 projector. - **Audio cue** — what plays through speakers (if anything). - **B-roll** — any pre-recorded video playing in a picture-in-picture corner. #### Beat 1 · 0:00 – 0:20 · The Hook (20 s) - **Spoken (presenter steps on stage, deck shows slide 01_hook.png):** > *[Play `audio_samples/hindi_brief.wav` — 4 s — Hindi voice clip: "Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad"]* > > "This is Gemma 3n E2B, untrained. It books the flight confidently. But mid-conversation, the airline's API renames `price` to `total_fare_inr`. > > *[Trace panel on screen shows: `KeyError: 'price'` — base model returns garbage]* > > Every engineer in this room has been burned by schema drift. We built an RL environment that teaches small models to survive it." - **Screen-share:** slide `01_hook.png` — title "DriftCall" + a single still of the base-model trace panel with the red `KeyError` highlighted. English caption text bar along the bottom of the slide reproduces the Hindi brief in English — canonical gloss: `"Friday I need to go to Bangalore, under ₹8000, after 6pm"` — so non-Hindi judges follow the payload immediately (DESIGN.md §14 risk-12 mitigation). - **Audio cue:** `audio_samples/hindi_brief.wav` plays at 0:02. Duration 4 s. If the in-room speakers fail, the presenter reads the English caption aloud and skips the audio — that fallback is rehearsed (see §5 error modes). - **B-roll:** none during the hook; clean slate focuses attention on the audio. #### Beat 2 · 0:20 – 1:00 · The Architecture (40 s) - **Spoken:** > "DriftCall is an OpenEnv environment with four mock Indian consumer APIs — airline, cab, hotel, restaurant. Twenty drift patterns fire mid-episode: schemas rename, policies shift, T&Cs update, pricing restructures, auth scopes upgrade. The agent receives voice briefs in Hindi, Tamil, Kannada, and Hinglish through Whisper; it speaks back through Kokoro. > > Five independent rewards: task completion, drift detection, constraint adherence, format, and an anti-hacking penalty. All deterministic. No LLM judge. 200,000 distinct procedural episodes." - **Screen-share:** slide `02_architecture.png` — a simplified redraw of DESIGN.md §3.1's high-level diagram, with the three boxes (Training, Deployed Env, Demo) but with the training box greyed and the env + demo boxes in full color. Reward list is rendered as five icon chips along the bottom. - **Audio cue:** none. - **B-roll:** a 30-second muted loop of `broll/gradio_ui_tour.mov` in the bottom-right PiP, showing mic activation → transcript appearing → trace panel scrolling. Starts at 0:22, loops. #### Beat 3 · 1:00 – 2:00 · The Training Curves (60 s) - **Spoken:** > "Five hundred GRPO steps on a single V100. Stage 1: learn tool use. Stage 2: single drift per episode. Stage 3: compound drift. Task completion climbs from 18% to 64%. Drift detection goes from 8% to 71%. Latency from drift-event to adaptation drops from 4.2 turns to 1.6." - **Screen-share:** slide `03_curves.png` — three plots side-by-side at 1920×1080: 1. Per-reward stack (R1–R5, step 0 → 500) — from `evaluation.md` final-eval output. 2. Drift-detection latency (4.2 → 1.6 turns) — from `evaluation.md` latency curve. 3. Per-language breakdown (Hindi / Tamil / Kannada / Hinglish bars for R1 before vs after) — from `evaluation.md` per-language breakdown. - **Audio cue:** none. - **B-roll:** `broll/wandb_curves.mov` — a 20-second time-lapse of the actual WandB dashboard during Stage 2 training, played once between 1:10 and 1:30 full-screen over the static slide; the static slide returns at 1:30 for the "18% to 64%" line so judges see the exact before/after numbers. #### Beat 4 · 2:00 – 2:40 · The Before/After (40 s) - **Spoken:** > *[Replay `audio_samples/hindi_brief.wav` from 0:00 — same clip]* > > "Same clip, trained checkpoint. Watch what happens after the drift fires. > > *[Trained model speaks `audio_samples/trained_reply_hinglish.wav` in Hindi — 6 s — "The price field appears to have changed — using the new `total_fare_inr` field. Confirming flight 6E-2345 at ₹7,200."]* > > It caught the rename. It adapted. It completed the booking." - **Screen-share:** slide `04_before_after.png` — split screen. Left column: base-model transcript ending in `KeyError`. Right column: trained-model transcript showing the same turns but the turn-5 response reading the new `total_fare_inr` field. Both columns have English captions under every Indic utterance (risk-12 mitigation). Reward bar at the bottom compares R1 + R2 for both runs side-by-side (base: 0.00 + 0.00 = 0.00; trained: 1.00 + 1.00 = 2.00, scaled per `rewards.md` to a total). - **Audio cue:** the Hindi brief replays at 2:02 (4 s), then `trained_reply_hinglish.wav` plays at 2:20 (6 s). - **B-roll:** `broll/trained_model_adapts.mov` — a 20-second screen-recording of the actual demo Space (`deploy_demo_space.md §2.2` `infer_turn`) being run against the trained adapter with a manual drift toggle. Plays in the right column full-height between 2:08 and 2:28, replacing the static trained-column text with a live-recorded walkthrough, then returns to the static split-screen for the reward bar reveal at 2:30. #### Beat 5 · 2:40 – 3:00 · The Close (20 s) - **Spoken:** > "Zero voice OpenEnv environments existed before this. Zero schema-drift environments. Zero Indic environments. We built all three in one, in 48 hours. Model, env, dataset, full training traces on HF Hub. Apache 2.0. That's DriftCall." - **Screen-share:** slide `05_close.png` — four logos in a row (HF Space env, HF Space demo, HF Hub model, HF Hub dataset) each with its URL underneath. The word "Apache 2.0" in large type bottom-center. QR code bottom-right linking to the HF blog post. - **Audio cue:** none. Silence is the punctuation. - **B-roll:** none. **Total:** 20 + 40 + 60 + 40 + 20 = 180 s. The script is physically tight; there is **no slack** for improvisation. Rehearse with a stopwatch; if a pass runs ≥ 3:05, cut one sentence from Beat 2 or Beat 3 (the architecture or curves beat — never cut Hook or Before/After). ### 3.2 Q&A prep (8 questions, 20 s each) Expected Q&A window: **2 minutes after the 3-minute pitch**. Each answer is pre-written to fit in ≤ 20 s spoken, leaving room for 5–6 total questions. Questions 1–5 are lifted verbatim from DESIGN.md §15 ("Q&A Prep — Anticipated Questions"); questions 6–8 are new and target likely probes from the Indic / HF / Sarvam judge subpanel (DESIGN.md §2.4). Format for each entry: **Q (source)** → **A (spoken answer, ≤ 20 s)** → **Proof-link (URL to cite if pressed)**. | # | Q | A (≤ 20 s) | Proof-link | |---|---|---|---| | 1 | **Why not use audio directly as GRPO input?** (DESIGN.md §15 Q1) | "TRL GRPO plus multimodal processors are not production-ready yet. We transcribe at the env boundary — same architecture as OpenAI Realtime, Pipecat, and Sarvam. The environment is genuinely voice-driven; training is derisked." | `docs/modules/training.md §10` + TRL issue #4637 | | 2 | **How do you prevent reward hacking?** (DESIGN.md §15 Q2) | "Five independent rewards plus asymmetric penalties. R2 requires either a specific field-name reference or a correct follow-up call — you can't fake it. Our probe report on 200 held-out episodes found zero exploits." | `docs/modules/rewards.md §7` + probe report PDF on GitHub | | 3 | **Can this scale to larger models?** (DESIGN.md §15 Q3) | "Yes. LoRA adapters transfer. Same env and rewards — swap in Gemma 4 E4B or larger. Our 128K context handles longer multi-drift conversations with no env change." | `docs/modules/training.md §10.1` LoRA config | | 4 | **Why Indic?** (DESIGN.md §15 Q4) | "Four of eleven judges are Indic-LLM specialists, but more importantly — Indic is where language variance crosses cultural context crosses ambiguity. That's where the RL signal lives." | DESIGN.md §2.4 judge panel table | | 5 | **What's the biggest limitation?** (DESIGN.md §15 Q5) | "Whisper on code-mixed Hinglish is noisy about 12% of the time. Our rewards score semantic match, not exact strings, so training is robust. Post-hackathon: swap to Sarvam ASR for production." | DESIGN.md §9.3 + risk-3 in §14 | | 6 | **What about Sarvam for ASR instead of Whisper?** (new — anticipates Aashay Sachdeva, Sarvam) | "Sarvam is the production choice. For the hackathon we used `faster-whisper-small` because it runs on CPU basic — the env Space is free tier. The ASR adapter in `audio.md` is a swappable interface; moving to Sarvam is a one-line engine change, no retraining." | `docs/modules/audio.md §2.2` ASR interface | | 7 | **Why not use Gemma 4 E4B?** (new — anticipates Adarsh Shirawalmath, small-model judge) | "V100 FP16 OOMs on E4B with GRPO at sequence-length 4096 even with 4-bit. E2B fits, converges in 500 steps, and the LoRA transfers to E4B at demo time — which is what the Demo Space already does." | DESIGN.md §9 What-NOT-to-do + `training.md §10.5` | | 8 | **Show me the 10th drift pattern.** (new — anticipates a technical probe from Ben Burtenshaw / Patronus) | "`cab.surge_restructure` — the cab API replaces a flat `surge_multiplier` field with a nested `{base, surge_component, time_band}` object, turn 3 or later. Our trained model probes the schema and restructures the price calculation. Full list is in `drift_injector.md §6.3`." | `docs/modules/drift_injector.md §6.3` pattern 10 | **Failure-mode answer** (if presenter doesn't know): "Good question — that's in our `drift_injector.md` design doc — I'll point you to the specific pattern after Q&A." Never bluff. ### 3.3 YouTube video script (< 2 min) Total duration: **1:55** (5 s buffer under the 2-min cap). 1920×1080, 30 fps, H.264, target ≤ 50 MB so it embeds in the HF Space README without a CDN. SRT caption track is mandatory (English translations of every Indic utterance — risk-12 mitigation). Scene table: | Scene | Time | Visual | Audio | Captions | |---|---|---|---|---| | 1. Cold open | 0:00 – 0:08 | Black screen → title card "DriftCall" in white + subtitle "voice-first Indic RL with schema drift" | 3 s of silence then `hindi_brief.wav` | `[Hindi brief]: "Friday I need to go to Bangalore, under ₹8000, after 6pm"` | | 2. Problem statement | 0:08 – 0:25 | Screen-record of base model running the brief → shows `KeyError: 'price'` in red in trace panel (`broll/base_model_keyerror.mov`) | Voice-over (English): "Untrained Gemma 3n E2B confidently books the flight. Then the airline renames a field mid-conversation. The model crashes." | VO captions | | 3. Environment tour | 0:25 – 0:50 | 2 D animation: 4 vendor boxes (airline/cab/hotel/restaurant) with drift lightning-bolts firing between them. Zoom into the reward card showing R1–R5. | VO: "DriftCall is an OpenEnv environment. Four Indian consumer APIs. Twenty drift patterns. Five deterministic rewards. Two hundred thousand procedural episodes." | VO captions + on-screen labels for each reward | | 4. Training curves | 0:50 – 1:15 | `broll/wandb_curves.mov` sped 2×, with three callouts overlaid: "R1: 18% → 64%", "R2: 8% → 71%", "Latency: 4.2 → 1.6 turns" | VO: "Five hundred GRPO steps on a single V100. Three curriculum stages. Here's the agent learning to detect and adapt to drift." | VO captions | | 5. Before / After | 1:15 – 1:42 | Split screen: left = base model transcript ending in KeyError; right = `broll/trained_model_adapts.mov` | `hindi_brief.wav` at 1:15, then `trained_reply_hinglish.wav` at 1:24 | On-screen: left-column KeyError label; right-column English gloss of the Hindi reply | | 6. Close | 1:42 – 1:55 | Four HF logos (env Space, demo Space, model, dataset) with URLs. "Apache 2.0" large. QR code. | VO: "Environment, model, dataset, training traces — all on Hugging Face Hub. Apache 2.0. Try it." | VO captions | **Captioning rule (locked):** every single Indic utterance in the video — spoken, in trace panels, or on slides — carries a burned-in English caption on the frame *and* a matching SRT cue. The SRT-only captioning is not sufficient; the burned-in caption covers the case where a judge watches with captions disabled. ### 3.4 Blog post outline (HF blog, < 2-min read) Target length: **550 words** (~2-minute read at 250 wpm). Published at `https://huggingface.co/blog//driftcall-indic-drift-rl`. Frontmatter in `blog/frontmatter.yaml` sets author, tags `["openenv", "rl", "indic", "drift-detection", "voice", "gemma"]`, and thumbnail `slides/04_before_after.png`. Five-section structure (word counts target total 550): 1. **The bug every production agent has hit** (~100 w) — opens with the same Hindi brief as the pitch (audio embed: `audio_hindi_brief.wav`; canonical English gloss rendered inline beside the embed: `"Friday I need to go to Bangalore, under ₹8000, after 6pm"`). Describes schema drift as a universal bug. Names Patronus TRAIL + FinanceBench as prior art (DESIGN.md §2.2 sponsor alignment). Ends with a one-sentence thesis: "We built an OpenEnv environment that makes drift the training signal." 2. **The environment** (~120 w) — a 2-paragraph description mirroring Beat 2 of the pitch (vendors, drift axes, 5 rewards, 200K episodes). Includes one code block copy-pasted from DESIGN.md §16.A.2 (reset/step cycle) + links to `env.md` and the env Space. 3. **Training curves** (~150 w) — embeds `fig_reward_curves.png`. Describes the 3-stage curriculum in one paragraph, cites the numbers (18% → 64% R1, 8% → 71% R2, 4.2 → 1.6 latency) in a bullet list, and links to the WandB project and `training.md`. One sentence on the V100 budget ("500 GRPO steps, single V100, 14 h") hits the reproducibility angle (Sanyam Bhutani / Yash Khare judge preferences, DESIGN.md §2.4). 4. **Before / After** (~100 w) — two audio embeds (`audio_hindi_brief.wav` + `audio_trained_reply.wav`). One paragraph describing the adaptation ("the model probes the schema, names the new field, completes the booking"). Links to the Demo Space for interactive play. 5. **Try it** (~80 w) — bullet list of four HF links (env Space, demo Space, model, dataset) + GitHub repo + Colab notebook. "Apache 2.0" bolded in the last line. Two-sentence credit to the team + hackathon. **Code links embedded in prose** (locked paths, verified to resolve at publish time): - Env Space: `https://huggingface.co/spaces//driftcall-env` - Demo Space: `https://huggingface.co/spaces//driftcall-demo` - Model: `https://huggingface.co//gemma-3n-e2b-driftcall-lora` - Dataset: `https://huggingface.co/datasets//driftcall-indic-briefs` - Colab: `https://colab.research.google.com/github//driftcall/blob/main/notebooks/driftcall_minimal_grpo.ipynb` - GitHub: `https://github.com//driftcall` **Audio-embed rule:** HF blog's markdown renders `