driftcall / PROJECT_README.md
saumilyajj's picture
Upload folder using huggingface_hub
f2df60e verified
|
Raw
History Blame
3.47 kB
metadata
title: DriftCall Env
emoji: 🧭
colorFrom: indigo
colorTo: pink
sdk: docker
app_port: 7860
pinned: false
short_description: OpenEnv β€” Indic voice concierge under schema drift.
license: apache-2.0

DriftCall

DriftCall is a voice-first Indic consumer-concierge RL environment where the agent must book flights, schedule rides, and file complaints across five mock vendor APIs while those APIs undergo deterministic mid-episode schema drift, pricing drift, T&C drift, policy drift, and auth drift. It is an OpenEnv-compliant REST environment plus an in-process Python trainer; a trained LoRA adapter for Gemma 3n E2B is published alongside.

Architecture at a glance

  • Env Space (this repo): FastAPI + OpenEnv REST on CPU-basic. Kokoro-82M TTS + faster-whisper-small ASR are baked into the image; no outbound network at runtime.
  • Trainer: in-process GRPO (TRL 0.23+, Unsloth 2026.4.5+) on a single V100. Text-in / text-out β€” audio is an env-boundary concern.
  • Demo Space: Gradio 5 on ZeroGPU, base Gemma 3n E2B + trained LoRA adapter switchable via a toggle.

Quickstart

# 1. Install the dev toolchain.
python3.11 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'

# 2. Run the tests.
python3 -m pytest tests/ -v

# 3. Serve the env locally.
export DRIFTCALL_ENV_TOKEN=dev-local-token
uvicorn app:app --host 0.0.0.0 --port 7860

# 4. Validate against the OpenEnv schema.
openenv validate http://localhost:7860 --auth-bearer "$DRIFTCALL_ENV_TOKEN"

Notebook

notebooks/train_driftcall.ipynb is built from the numbered cells under cells/. Rebuild with:

python3 notebooks/build_notebook.py

Weights & Biases (optional)

Training runs auto-log to wandb. Configure via env vars (override priority highest-to-lowest):

  1. Environment variables β€” set on the host or in your shell:
    export WANDB_API_KEY=<your-key-from-wandb.ai/authorize>
    export WANDB_PROJECT=driftcall              # default
    export WANDB_ENTITY=<your-team>             # optional
    export WANDB_MODE=online                    # online | offline | disabled
    
  2. cells/_secrets.py hardcoded fallback β€” used when env vars are unset. Edit the constant in that file to rotate the key (private repo).
  3. None β€” init_wandb() raises at run time if WANDB_MODE != "disabled" and no API key is reachable.

Disable for local dev / CI:

export WANDB_MODE=disabled

Custom metrics logged each training step (training.md Β§3.3.3):

  • train/beta_adaptive β€” current KL coefficient (mutated by AdaptiveKLCallback)
  • train/kl_measured β€” measured KL between policy and reference
  • train/kl_target β€” target KL (default = BETA_KL = 0.04)
  • train/beta_clamped_to_min β€” 1 if Ξ² was floored at beta_min this step
  • train/beta_clamped_to_max β€” 1 if Ξ² was ceilinged at beta_max this step

Run tags (set at wandb.init): stage{N}, gemma-3n-e2b, bf16 or fp16, adaptive-kl or static-kl, seed{N}.

License

Apache License 2.0. See LICENSE (included at repo root when this artifact is published as a Space).