Spaces:

build-small-hackathon
/

lesson-agent

Running on Zero

MSG commited on 15 days ago

Commit

6cea344

1 Parent(s): 514a84f

Feat/finetuning model (#18)

* notebook path

* modal plan

* finetune app

* pyproject and notebook

* usage modal

* usage volume modal

* usage build

* finetune app modal fix

* modal error build

* finetune app volumes

* server app container

* server app and experiment

* common fix

* evals

* common fix

* docs gpu server

* loop common stuff

* wip finetuning

* wip finetuning

* wip finetuning

* server app

* server app fix

* server app and docs

* docs

* docs and fix import

* wip experiment

* next todo

* experimentnal and fix pyproject

* experimentnal and fix pyproject

* fix stuff

* update readme

Files changed (21) hide show

.cursor/plans/modal_finetune_benchmark_ac96d473.plan.md +268 -0
.env.example +4 -0
README.md +32 -5
TODO.md +44 -0
pyproject.toml +4 -0
research/USAGE.md +59 -0
research/data/science-tutor-chat.jsonl +10 -0
research/evals/configs/eval_profiles.yaml +19 -0
research/evals/configs/lm_eval_instructions.yaml +1 -1
research/evals/configs/lm_eval_math.yaml +18 -0
research/evals/configs/lm_eval_science.yaml +19 -0
research/evals/pyproject.toml +1 -1
research/modal/README.md +774 -0
research/modal/SERVER.md +204 -0
research/modal/_common.py +520 -0
research/modal/experiments.yaml +131 -0
research/modal/finetune_app.py +370 -0
research/modal/server_app.py +472 -0
{notebook → research/notebook}/gemma-finetune.ipynb +0 -0
research/notebook/minicpm5-modal-finetune.ipynb +216 -0
uv.lock +244 -11

.cursor/plans/modal_finetune_benchmark_ac96d473.plan.md ADDED Viewed

	@@ -0,0 +1,268 @@

+---
+name: Modal Finetune Benchmark
+overview: Add a Modal GPU pipeline that runs existing `research/finetune.py` and `slm-lm-eval` on OpenBMB MiniCPM5-1B across multiple datasets, persists checkpoints to Modal Volumes, and provides a companion Modal Notebook for interactive exploration — targeting both the Modal partner track and the Well-Tuned finetuning track.
+todos:
+  - id: modal-scaffold
+    content: Create research/modal/finetune_app.py with Image, Volumes, HF secret, GPU functions
+    status: completed
+  - id: experiments-yaml
+    content: Add research/modal/experiments.yaml with lesson/alpaca/smoltalk job matrix + smoke limits
+    status: completed
+  - id: train-worker
+    content: Implement finetune_one() subprocess wrapper around research/finetune.py with volume commit
+    status: completed
+  - id: eval-worker
+    content: Implement run_lm_eval() subprocess wrapper for baseline + post-train comparison
+    status: completed
+  - id: sweep-entrypoint
+    content: Add @app.local_entrypoint sweep() to run baseline, map trainings, then eval each checkpoint
+    status: completed
+  - id: modal-notebook
+    content: Create research/notebook/minicpm5-modal-finetune.ipynb for interactive OpenBMB MiniCPM5-1B demo
+    status: completed
+  - id: docs-deps
+    content: Add research/modal/README.md, modal dependency group in pyproject.toml, HF_TOKEN note in .env.example
+    status: completed
+isProject: false
+---
+# Modal Finetuning + Benchmark Plan
+## Goal
+Run GPU fine-tuning and academic benchmarks **without local CUDA**, reusing your existing scripts:
+- Training: [`research/finetune.py`](research/finetune.py) (LoRA/QLoRA on `openbmb/MiniCPM5-1B`)
+- Benchmarks: `slm-lm-eval` via [`research/evals/`](research/evals/) (ARC, HellaSwag, GSM8K, …)
+- Datasets: lesson chat (default), plus Hub sets already documented in finetune docstring
+Deliverables for hackathon tracks:
+| Track | What judges see |
+|-------|-----------------|
+| **Modal** | `modal run` job + Modal Volume/Notebook link in README |
+| **Well-Tuned / Finetuning** | Before/after `lm-eval` on base vs LoRA adapter, weights in `models/finetuned/` or HF Hub |
+## Current state (no Modal yet)
+- [`research/finetune.py`](research/finetune.py) is self-contained CLI: resolves `minicpm5-1b` from [`models.yaml`](models.yaml), supports `--dataset`, `--format`, `--mode`, and optional `--lm-eval-after`.
+- Eval harness lives in workspace package `slm-evals`; smoke config at [`research/evals/configs/lm_eval_smoke.yaml`](research/evals/configs/lm_eval_smoke.yaml).
+- [`research/notebook/gemma-finetune.ipynb`](research/notebook/gemma-finetune.ipynb) has early OpenBMB load cells but no training loop — good skeleton for a Modal Notebook.
+- Root [`pyproject.toml`](pyproject.toml) already defines `finetune` and `lm-eval` dependency groups (torch, peft, bitsandbytes, lm-eval).
+```mermaid
+flowchart TB
+  subgraph local [Your laptop]
+    cli["modal run research/modal/finetune_app.py"]
+    nb["Modal GPU Notebook"]
+  end
+  subgraph modal [Modal cloud]
+    img["Image: torch + uv sync finetune/lm-eval"]
+    fn_train["@app.function gpu=A10G finetune_one"]
+    fn_eval["@app.function gpu=A10G run_lm_eval"]
+    vol_hf["Volume: hf-cache"]
+    vol_out["Volume: finetuned-outputs"]
+  end
+  subgraph repo_scripts [Mounted repo]
+    ft["research/finetune.py"]
+    eval["slm-lm-eval"]
+    data["research/data/*.jsonl"]
+  end
+  cli --> fn_train
+  cli --> fn_eval
+  nb --> ft
+  fn_train --> ft
+  fn_eval --> eval
+  fn_train --> vol_hf
+  fn_train --> vol_out
+  fn_eval --> vol_hf
+  fn_eval --> vol_out
+  img --> fn_train
+  img --> fn_eval
+```
+## Architecture
+### 1. New Modal module: `research/modal/`
+Create a small Modal package (2–3 files, no refactor of `finetune.py`):
+| File | Role |
+|------|------|
+| `research/modal/finetune_app.py` | Main `modal.App`, image, volumes, `@app.function` workers |
+| `research/modal/experiments.yaml` | Dataset sweep matrix (name, hub id, format, max_samples) |
+| `research/modal/README.md` | Setup (`modal setup`), secrets, run commands |
+**Image** (per [Modal CUDA guide](https://modal.com/docs/guide/cuda)):
+```python
+image = (
+    modal.Image.debian_slim(python_version="3.12")
+    .apt_install("git")
+    .pip_install("uv")
+    .add_local_file("pyproject.toml", "/repo/pyproject.toml", copy=True)
+    .add_local_file("uv.lock", "/repo/uv.lock", copy=True)
+    # ... copy workspace members needed for finetune + slm-evals
+    .run_commands(
+        "cd /repo && uv sync --frozen --group finetune --group lm-eval --package slm-evals"
+    )
+    .add_local_dir("research", remote_path="/repo/research")
+    .add_local_dir("libs/inference", remote_path="/repo/libs/inference")
+    .add_local_file("models.yaml", "/repo/models.yaml")
+)
+```
+Use `pip_install("torch")` on the image **or** let `uv sync` pull torch — either works on Modal since [driver API is pre-installed](https://modal.com/docs/guide/cuda).
+**Volumes** (persist across runs):
+- `hf-cache` → mount at `/root/.cache/huggingface` (model + dataset cache)
+- `slm-finetune` → mount at `/vol/finetuned` (adapters, `training_results.json`, lm-eval `results/`)
+**Secrets**: `modal.Secret.from_name("huggingface")` with `HF_TOKEN` for gated models and faster Hub downloads.
+**GPU**: `gpu="A10G"` default (24 GB is plenty for MiniCPM5-1B LoRA at `max_len=1024`). Use `gpu="T4"` for QLoRA smoke tests; bump to `A100` only if you scale `batch_size` or `max_len`.
+### 2. Training worker — wrap existing CLI
+Do **not** rewrite training logic. Each Modal function shells into your script:
+```python
+@app.function(gpu="A10G", volumes={...}, secrets=[...], timeout=7200)
+def finetune_one(job: dict) -> dict:
+    out = f"/vol/finetuned/{job['name']}"
+    cmd = [
+        "uv", "run", "python", "research/finetune.py",
+        "--preset", "minicpm5-1b",
+        "--mode", job.get("mode", "lora"),
+        "--dataset", job["dataset"],
+        "--format", job["format"],
+        "--out", out,
+        "--trust_remote_code",  # implicit via preset; set TRUST_REMOTE_CODE=1 in env
+        *optional_flags(job),
+    ]
+    subprocess.run(cmd, cwd="/repo", check=True, env={**os.environ, "HF_HOME": "/root/.cache/huggingface"})
+    vol_finetune.commit()
+    return json.loads(Path(out, "training_results.json").read_text())
+```
+Key env vars to pass through (already supported by [`finetune.py`](research/finetune.py)):
+- `FINETUNE_DATASET_CONFIG`, `FINETUNE_DATASET_SPLIT`, `FINETUNE_MAX_SAMPLES`
+- `TRUST_REMOTE_CODE=true` (required for `openbmb/MiniCPM5-1B`)
+### 3. Benchmark worker — baseline + per-checkpoint
+Separate function so you can re-eval without re-training:
+```python
+@app.function(gpu="A10G", volumes={...}, timeout=3600)
+def run_lm_eval(*, experiment_name: str, preset: str | None = None,
+                model_path: str | None = None, adapter_path: str | None = None,
+                config: str = "research/evals/configs/lm_eval_smoke.yaml",
+                compare_to: str | None = None) -> dict:
+    # uv run --package slm-evals slm-lm-eval ...
+```
+**Suggested experiment matrix** in `experiments.yaml`:
+| Job name | Dataset | Format | Notes |
+|----------|---------|--------|-------|
+| `lesson-lora` | `research/data/education-lesson-chat.jsonl` | `chat` | Primary Well-Tuned story |
+| `alpaca-lora` | `tatsu-lab/alpaca` | `alpaca` | General instruction |
+| `smoltalk-lora` | `HuggingFaceTB/smoltalk` | `chat` | `dataset_config: all`, `split: train[:500]` |
+Smoke flags for hackathon time budget: `--max_steps 100` or `FINETUNE_MAX_SAMPLES=200`, plus lm-eval `limit: 25` from [`lm_eval_smoke.yaml`](research/evals/configs/lm_eval_smoke.yaml).
+### 4. Orchestration — `local_entrypoint`
+```python
+@app.local_entrypoint()
+def sweep(train: bool = True, eval_only: bool = False):
+  jobs = yaml.safe_load(open("research/modal/experiments.yaml"))
+  if not eval_only:
+    baseline = run_lm_eval.remote(
+      experiment_name="minicpm5-1b__baseline",
+      preset="minicpm5-1b",
+      config="research/evals/configs/lm_eval_compare_study.yaml",
+    )
+  for result in finetune_one.map(jobs["finetune"]):
+    run_lm_eval.remote(
+      experiment_name=f"{result['preset']}__{job_name}",
+      model_path="openbmb/MiniCPM5-1B",
+      adapter_path=result["output_dir"],
+      compare_to=baseline["results_json"],
+    )
+```
+Use `.map()` for parallel dataset runs only if budget allows; otherwise sequential `for job in jobs: finetune_one.remote(job)`.
+### 5. Modal GPU Notebook (OpenBMB)
+Create [`research/notebook/minicpm5-modal-finetune.ipynb`](research/notebook/minicpm5-modal-finetune.ipynb):
+1. **Setup cell** — `pip install` / `uv sync` finetune group; verify `nvidia-smi` (Modal Notebooks have GPU per [Modal intro](https://modal.com/docs/guide)).
+2. **Clone or mount repo** — `git clone` your hackathon repo or upload `research/finetune.py` + `models.yaml` + lesson JSONL.
+3. **Smoke train** — `%run research/finetune.py --preset minicpm5-1b --mode lora --max_steps 20`
+4. **Inline eval** — `%run` or subprocess `slm-lm-eval --profile smoke --preset minicpm5-1b-lesson-lora` (after registering adapter path in a temp preset or passing `--model` + `--adapter`).
+5. **Sample generation** — reuse the smoke block at end of `finetune.py`.
+Notebook is the **demo video** surface; `finetune_app.py` is the **reproducible** surface for judges.
+Optional: use Modal Sandbox [`Sandbox.exec`](https://modal.com/docs/guide/sandbox-spawn) only for one-off shell probes (`nvidia-smi`, `python -c "import torch"`) — not for full training (Functions + Volumes are the right primitive).
+### 6. Pulling results back locally
+After `modal run`:
+```bash
+modal volume get slm-finetune minicpm5-1b-lesson-lora ./models/finetuned/minicpm5-1b-lesson-lora
+modal volume get slm-finetune results/lm_eval ./results/lm_eval
+```
+Then wire Space via existing preset [`minicpm5-1b-lesson-lora`](models.yaml) (`adapter_path: ./models/finetuned/minicpm5-1b-lora`).
+Optional stretch: push adapter to `build-small-hackathon/<your-space>-lora` with `huggingface_hub` in a post-train Modal function.
+## Setup checklist (one-time)
+1. `pip install modal && modal setup` ([getting started](https://modal.com/docs/guide))
+2. `modal secret create huggingface HF_TOKEN=<token>`
+3. `uv sync --group finetune --group lm-eval` locally (validates lockfile before image build)
+4. First smoke: `modal run research/modal/finetune_app.py --max-steps 20 --dataset lesson`
+## Hackathon submission narrative
+Document in root README or `research/modal/README.md`:
+1. **Modal track** — link to Modal app name, example `modal run` output, screenshot of Volume or Notebook.
+2. **Finetuning track** — table from `comparison.md` / `summary.md` showing base vs lesson-LoRA on same lm-eval config (fair comparison per [`research/USAGE.md`](research/USAGE.md) verification checklist).
+3. **Space integration** — `ACTIVE_MODEL=minicpm5-1b-lesson-lora` after downloading adapter.
+## Files to add (minimal diff)
+- `research/modal/finetune_app.py` — Modal app (~150 lines)
+- `research/modal/experiments.yaml` — 3 dataset jobs + eval config pointers
+- `research/modal/README.md` — commands only
+- `research/notebook/minicpm5-modal-finetune.ipynb` — notebook path
+- Root `pyproject.toml` — add optional `modal` dependency group: `modal>=0.73`
+- `.env.example` — note `HF_TOKEN` for Modal secret (no token in repo)
+## What we intentionally skip
+- Refactoring `finetune.py` into importable library (subprocess wrapper is enough)
+- Running agentic benchmarks (BFCL/GAIA) on Modal first pass — heavier deps; add later if time
+- Modal Sandboxes for training loops — Functions are simpler and support GPU + Volumes
+## Risk mitigations
+| Risk | Mitigation |
+|------|------------|
+| OpenBMB `trust_remote_code` | Set `TRUST_REMOTE_CODE=true` in Modal function env |
+| Image build slow | Cache `hf-cache` Volume; pin `uv.lock` |
+| OOM on small GPU | `--mode qlora`, `max_len=512`, `batch_size=1` (auto in [`_apply_low_vram_defaults`](research/finetune.py)) |
+| lm-eval path assumptions | Run from `/repo` cwd; `slm_evals` resolves `_REPO_ROOT` four parents up from its module |
+| Volume not persisted | Call `volume.commit()` after train/eval |

.env.example CHANGED Viewed

@@ -39,6 +39,10 @@ ALLOW_MODEL_SWITCH=false
 # ACTIVE_MODEL=gemma-merged-local
 # MODEL_ID=./gemma_merged_model
 # --- Fine-tuning (research/finetune.py) ---
 # FINETUNE_PRESET=minicpm5-1b
 # FINETUNE_MODEL=openbmb/MiniCPM5-1B

 # ACTIVE_MODEL=gemma-merged-local
 # MODEL_ID=./gemma_merged_model
+# --- Modal (research/modal/finetune_app.py) ---
+# Create secret: modal secret create huggingface HF_TOKEN=<token>
+# HF_TOKEN=hf_...
 # --- Fine-tuning (research/finetune.py) ---
 # FINETUNE_PRESET=minicpm5-1b
 # FINETUNE_MODEL=openbmb/MiniCPM5-1B

README.md CHANGED Viewed

@@ -48,6 +48,9 @@ uv run --package gradio-space python -m gradio_space.app
 Open [http://localhost:7860](http://localhost:7860).
 ### Studio UI (Off Brand track)
 The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + workspace (Research → Slides → Language lessons), and `@server.api` endpoints wired to the same Python backends as Classic.
@@ -57,8 +60,25 @@ The default landing page is a **custom AI Studio workspace** at `/` — not defa
 See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.
-- **Lesson slides** — topic, grade, slide count → downloadable PowerPoint
-- **Research Agent** — scrape/index sources into MemRAG, then ask questions offline with citations
 ## How it works
@@ -108,9 +128,15 @@ See [`.env.example`](.env.example) and [`models.yaml`](models.yaml) for model pr
 A root `Dockerfile` is kept for a later **Docker SDK** deploy (flip README to `sdk: docker`). See [USAGE.md](USAGE.md).
-## Hackathon checklist
-- **Track:** Backyard AI — lesson slide builder for a teacher you know
 - Space live under build-small-hackathon
 - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
 - Social post published
@@ -123,7 +149,8 @@ A root `Dockerfile` is kept for a later **Docker SDK** deploy (flip README to `s
 - **OpenBMB** — `openbmb/MiniCPM5-1B`
 - **Sharing is Caring** — upload traces with `scripts/upload_trace.py`
 - **Off-the-Grid** — local inference only (no cloud LLM API)
-- **Well-Tuned** — optional fine-tuned preset in `models.yaml` (Phase 2)
 ## Agent trace upload

 Open [http://localhost:7860](http://localhost:7860).
+- **Lesson slides** — topic, grade, slide count → downloadable PowerPoint
+- **Research Agent** — scrape/index sources into MemRAG, then ask questions offline with citations
 ### Studio UI (Off Brand track)
 The default landing page is a **custom AI Studio workspace** at `/` — not default Gradio chrome. It uses **Gradio 6 Server mode** (`gradio.Server`): Material 3 layout, sidebar + workspace (Research → Slides → Language lessons), and `@server.api` endpoints wired to the same Python backends as Classic.
 See [apps/gradio-space/README.md](apps/gradio-space/README.md) for API names and a 2-minute judge demo script.
+### Modal + Fine-tuning track (Well-Tuned)
+Cloud GPU **train → eval → gate → publish** for a skill-matrix of QLoRA adapters on `openbmb/MiniCPM5-1B` — no local CUDA required. Each job in [`research/modal/experiments.yaml`](research/modal/experiments.yaml) (math, science, coding, reasoning, teaching, …) fine-tunes with [`research/finetune.py`](research/finetune.py), benchmarks with `slm-lm-eval`, gates on per-skill `goals`, and publishes passing adapters to the Hub.
+- **Modal (partner track)** — `modal run` / warm GPU worker, Volume artifacts, optional [Modal Notebook](research/notebook/minicpm5-modal-finetune.ipynb)
+- **Well-Tuned badge** — before/after lm-eval per skill + gated Hub publish (`MSGEncrypted/minicpm5-1b-<skill>-lora`)
+Full runbook: [`research/modal/README.md`](research/modal/README.md) · agent loop: [`research/modal/SERVER.md`](research/modal/SERVER.md) · local research overview: [`research/USAGE.md`](research/USAGE.md)
+```bash
+uv sync --group modal
+modal setup && modal secret create huggingface HF_TOKEN=<token>
+modal run research/modal/server_app.py --ping                       # health check
+modal run research/modal/server_app.py --job math-lora --max-steps 20 --no-publish   # cheap smoke
+modal run research/modal/server_app.py --pipeline                   # full sweep: baselines → train → eval → gate → publish
+```
+Pull a passing adapter into the Space: `modal volume get slm-finetune math-lora ./models/finetuned/minicpm5-1b-lora`, then set `ACTIVE_MODEL=minicpm5-1b-lesson-lora`.
 ## How it works
 A root `Dockerfile` is kept for a later **Docker SDK** deploy (flip README to `sdk: docker`). See [USAGE.md](USAGE.md).
+## Hackathon tracks & checklist
+| Track | What we ship |
+| ----- | ------------ |
+| **Backyard AI** (primary) | Lesson slide builder for a teacher you know — topic + grade → downloadable `.pptx` |
+| **Off Brand** | Custom Studio UI at `/` (Gradio 6 Server mode, not default Gradio chrome) |
+| **Modal** (partner) | GPU `train → eval → gate → publish` on [Modal](https://modal.com) — [`research/modal/`](research/modal/) |
+| **Well-Tuned** (finetuning) | Skill-matrix QLoRA adapters on MiniCPM5-1B, lm-eval gates, Hub publish |
 - Space live under build-small-hackathon
 - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
 - Social post published
 - **OpenBMB** — `openbmb/MiniCPM5-1B`
 - **Sharing is Caring** — upload traces with `scripts/upload_trace.py`
 - **Off-the-Grid** — local inference only (no cloud LLM API)
+- **Well-Tuned** — per-skill QLoRA adapters trained + gated + published via the [Modal + Fine-tuning track](#modal--fine-tuning-track-well-tuned)
+- **Modal** — same pipeline; see [`research/modal/README.md`](research/modal/README.md)
 ## Agent trace upload

TODO.md ADDED Viewed

	@@ -0,0 +1,44 @@

+# Hackathon badge/track TODO
+Strategy: one **Backyard AI** submission, stacking as many merit badges, sponsor
+awards, and special awards as credibly fit the small-model / local-first story.
+Deadline: **June 15, 2026** (Space + demo video + social post).
+This PR (`feat/finetuning_model`) focuses on **🎯 Well-Tuned** + **Modal**. Everything
+below is parked for follow-up PRs.
+## In this PR (finetuning + Modal) — done here
+- [x] Make published adapters **public** so judges can verify the Well-Tuned badge
+      (`research/modal/experiments.yaml`: `private: false`).
+- [x] Add hackathon discoverability tags + license to the published model card
+      (`research/modal/_common.py: render_model_card`).
+## 🦙 Llama Champion badge (cheap, high value)
+- [ ] Run the Space on the **llama.cpp / GGUF** backend (`libs/inference/src/inference/llama_cpp.py`).
+- [ ] Confirm MiniCPM5-1B has a GGUF (or convert/quantize one) — keep OpenBMB story intact.
+- [ ] Document the llama.cpp path in README + Space (which `ACTIVE_MODEL` preset).
+## 📓 Field Notes badge (cheapest miss — no blog exists yet)
+- [ ] Write a blog post / report on the fine-tuning + Modal pipeline:
+      skill-matrix QLoRA -> lm-eval -> per-skill gate -> Hub publish.
+- [ ] Publish it (HF blog / personal) and link from README.
+- [ ] This badge + the others clinches **Bonus Quest Champion ($2k)**.
+## README + submission hygiene
+- [ ] Update README badge checklist to reflect full strategy (add Llama Champion, Field Notes).
+- [ ] Best Demo: polished demo video (real teacher -> topic -> .pptx download -> trace).
+- [ ] Social post published (required for submission).
+- [ ] Community Choice: share the Space widely.
+## Decided NOT to chase (conflicts with MiniCPM / local-first core)
+- OpenAI Track — requires OpenAI models; collides with Tiny Titan / OpenBMB / Off-the-Grid.
+- NVIDIA Nemotron — requires Nemotron model; same conflict.
+- Thousand Token Wood — different main track; can't be in both.
+## Badge scorecard (target = all 6 + Bonus Quest Champion)
+- [x] 🔌 Off the Grid — local inference only
+- [x] 🎨 Off-Brand — custom Studio UI (Gradio 6 Server mode)
+- [x] 📡 Sharing is Caring — agent trace upload
+- [~] 🎯 Well-Tuned — pipeline ready; needs a passing public adapter on the Hub
+- [ ] 🦙 Llama Champion — see above
+- [ ] 📓 Field Notes — see above

pyproject.toml CHANGED Viewed

@@ -28,6 +28,10 @@ evals = [
 lm-eval = [
     "slm-evals[lm-eval]",
 ]
 [tool.uv.workspace]
 members = [

 lm-eval = [
     "slm-evals[lm-eval]",
 ]
+modal = [
+    "modal>=0.73.0",
+    "pyyaml>=6.0",
+]
 [tool.uv.workspace]
 members = [

research/USAGE.md CHANGED Viewed

@@ -27,6 +27,65 @@ uv sync --group lm-eval
 | `finetune` | `research/finetune.py` | `peft`, `datasets`, `bitsandbytes` (QLoRA) |
 | `evals` | `slm-evals` workspace member | `slm-benchmark` CLI |
 | `lm-eval` | `slm-evals[lm-eval]` | `slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, …) |
 ---

 | `finetune` | `research/finetune.py` | `peft`, `datasets`, `bitsandbytes` (QLoRA) |
 | `evals` | `slm-evals` workspace member | `slm-benchmark` CLI |
 | `lm-eval` | `slm-evals[lm-eval]` | `slm-lm-eval` CLI (GSM8K, ARC, HellaSwag, …) |
+| `modal` | `research/modal/finetune_app.py` | Cloud GPU train + eval via [Modal](https://modal.com/docs/guide) |
+| `modal` | `research/modal/server_app.py` | Long-lived warm GPU worker for human/AI iteration loops |
+---
+## 0. Modal cloud GPU (`research/modal/`)
+Run a **skill-matrix** of QLoRA fine-tunes **without local CUDA**: each job in
+[`modal/experiments.yaml`](modal/experiments.yaml) trains one adapter for a
+category (math, science, coding, reasoning, teaching, instructions), evaluates
+it against a matching `slm-lm-eval` profile vs. a per-profile baseline, checks
+the result against `goals`, and — only if the gate passes — publishes the
+adapter to the Hugging Face Hub. Adapters + results are saved to Modal Volume
+`slm-finetune`.
+```bash
+uv sync --group modal
+modal setup
+modal secret create huggingface HF_TOKEN=<token>   # needs write access for Hub publish
+# Smoke run for one skill: baseline -> train -> eval -> gate -> publish -> pull
+modal run research/modal/finetune_app.py --job math-lora --max-steps 20
+# Whole skill matrix
+modal run research/modal/finetune_app.py
+# One category, train+eval only (no Hub push)
+modal run research/modal/finetune_app.py --category science --no-publish
+# Re-check the gate and publish an already-evaluated job
+modal run research/modal/finetune_app.py::publish_only --job math-lora
+# Pull adapters + lm-eval results without re-running anything
+modal run research/modal/finetune_app.py::pull --category math
+```
+Set real values for `defaults.hub_org` and each job's `publish.hub_repo` in
+`experiments.yaml` (placeholder: `your-hf-username`) before publishing — repos
+are created automatically. Jobs with no `goals` (e.g. `alpaca-lora`) are
+trained/evaluated but never gated or published (local-only).
+For a multi-hour session on **one warm GPU** (iterative human/AI loop without
+re-downloading weights each run), use `research/modal/server_app.py` instead —
+same skill-matrix pipeline (`--job`/`--category`/`--pipeline`/`--publish-only`)
+on a deployed `GpuWorker`.
+Full guide: **[modal/README.md](modal/README.md)** · **Agent loop:** **[modal/SERVER.md](modal/SERVER.md)** · [Modal Volumes](https://modal.com/docs/guide/volumes) · [Modal Notebooks](https://modal.com/docs/guide/notebooks)
+**Iterative loop (one warm GPU, many runs):**
+```bash
+modal deploy research/modal/server_app.py
+modal run -d research/modal/server_app.py --hours 6          # keep worker alive
+modal run research/modal/server_app.py --ping                # verify
+modal run research/modal/server_app.py --job lesson-lora --max-steps 20
+modal app stop slm-gpu-worker -y                             # when done
+```
+Interactive notebook: upload [`research/notebook/minicpm5-modal-finetune.ipynb`](notebook/minicpm5-modal-finetune.ipynb) at [modal.com/notebooks](https://modal.com/notebooks), attach GPU + Volume `slm-finetune` + Secret `huggingface`.
 ---

research/data/science-tutor-chat.jsonl ADDED Viewed

	@@ -0,0 +1,10 @@

+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What is photosynthesis?"}, {"role": "assistant", "content": "Photosynthesis is how plants make their own food. Using sunlight, water, and carbon dioxide from the air, a plant's leaves produce glucose (sugar) for energy and release oxygen as a byproduct. For example, a tree's leaves absorb sunlight during the day and release the oxygen we breathe."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "Why does ice float on water?"}, {"role": "assistant", "content": "Ice floats because it is less dense than liquid water. When water freezes, its molecules arrange into a crystal structure that takes up more space, so the same mass of ice has more volume than liquid water. For example, an ice cube floats in a glass of water with most of it submerged but the top sticking out."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What causes the water cycle?"}, {"role": "assistant", "content": "The water cycle is driven by the sun's heat. The sun evaporates water from oceans and lakes into vapor, which rises and cools to form clouds (condensation). When clouds get heavy, water falls back as rain or snow (precipitation), and it eventually flows back to the oceans. For example, rain that falls on mountains flows downhill into rivers and back to the sea."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What is Newton's first law of motion?"}, {"role": "assistant", "content": "Newton's first law says an object at rest stays at rest, and an object in motion stays in motion at the same speed and direction, unless a force acts on it. This is also called inertia. For example, a hockey puck slides across ice and only slows down because of friction, not because it 'runs out' of motion on its own."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What are the three states of matter?"}, {"role": "assistant", "content": "The three common states of matter are solid, liquid, and gas. Solids have a fixed shape and volume, liquids take the shape of their container but keep their volume, and gases expand to fill their container. For example, water can be ice (solid), liquid water, or steam (gas), depending on its temperature."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "How does a simple electric circuit work?"}, {"role": "assistant", "content": "A simple circuit needs a power source, a conductor (wire), and a load (like a bulb), all connected in a loop. Electrons flow from the power source through the wire, through the load where their energy is converted (e.g., into light), and back to the source. For example, connecting a battery to a light bulb with two wires lets current flow and lights the bulb; cutting one wire breaks the circuit and the bulb turns off."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What is the difference between a producer and a consumer in an ecosystem?"}, {"role": "assistant", "content": "Producers, like plants and algae, make their own food using sunlight through photosynthesis. Consumers cannot make their own food and must eat other organisms to get energy. For example, grass is a producer, a rabbit that eats the grass is a consumer, and a fox that eats the rabbit is also a consumer."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "Why do we see different phases of the Moon?"}, {"role": "assistant", "content": "The Moon's phases happen because we see different amounts of its sunlit side as it orbits Earth. The Moon doesn't change shape; only the portion we can see lit by the sun changes. For example, during a full moon, the entire visible side is lit because the Moon is on the opposite side of Earth from the sun, while during a new moon the lit side faces away from us."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What is the function of the human heart?"}, {"role": "assistant", "content": "The heart is a muscular pump that circulates blood through the body. It pushes oxygen-rich blood from the lungs out to the body's tissues, and pumps oxygen-poor blood back to the lungs to pick up more oxygen. For example, each heartbeat you feel is the heart contracting to push blood through your blood vessels."}]}
+{"messages": [{"role": "system", "content": "You are a friendly science tutor for middle-school students. Explain concepts in simple terms and give one concrete example."}, {"role": "user", "content": "What is an acid and a base?"}, {"role": "assistant", "content": "Acids are substances that release hydrogen ions (H+) in water and taste sour, while bases release hydroxide ions (OH-) and taste bitter or feel slippery. Acidity is measured on the pH scale, where values below 7 are acidic and above 7 are basic. For example, lemon juice is acidic (low pH) and baking soda dissolved in water is basic (high pH)."}]}

research/evals/configs/eval_profiles.yaml CHANGED Viewed

@@ -25,6 +25,25 @@ profiles:
       - arc_challenge
       - hellaswag
   understanding:
     tool: slm-lm-eval
     claim: Better language understanding

       - arc_challenge
       - hellaswag
+  math:
+    tool: slm-lm-eval
+    claim: Better math reasoning
+    description: Grade-school math word problems (GSM8K) + abstract reasoning QA.
+    config: lm_eval_math.yaml
+    tasks:
+      - gsm8k
+      - arc_challenge
+  science:
+    tool: slm-lm-eval
+    claim: Better science knowledge
+    description: Science fact recall (SciQ, OpenBookQA) + science reasoning QA.
+    config: lm_eval_science.yaml
+    tasks:
+      - sciq
+      - openbookqa
+      - arc_challenge
   understanding:
     tool: slm-lm-eval
     claim: Better language understanding

research/evals/configs/lm_eval_instructions.yaml CHANGED Viewed

@@ -1,6 +1,6 @@
 # Instruction following profile — IFEval (verifiable constraints)
 # Run: slm-lm-eval --profile instructions --preset minicpm5-1b
-# Requires lm-eval extras; install with: uv sync --group lm-eval
 profile: instructions
 claim: Better instruction following

 # Instruction following profile — IFEval (verifiable constraints)
 # Run: slm-lm-eval --profile instructions --preset minicpm5-1b
+# Requires lm-eval[ifeval] extras; install with: uv sync --group lm-eval
 profile: instructions
 claim: Better instruction following

research/evals/configs/lm_eval_math.yaml ADDED Viewed

	@@ -0,0 +1,18 @@

+# Math profile — grade-school word problems + abstract reasoning QA
+# Run: slm-lm-eval --profile math --preset minicpm5-1b --experiment-name math-baseline
+profile: math
+claim: Better math reasoning
+tasks:
+  - gsm8k
+  - arc_challenge
+num_fewshot: 5
+limit: 100
+seed: 42
+batch_size: auto
+device: auto
+dtype: bfloat16
+trust_remote_code: true
+output_dir: results/lm_eval

research/evals/configs/lm_eval_science.yaml ADDED Viewed

	@@ -0,0 +1,19 @@

+# Science profile — fact recall + elementary science reasoning
+# Run: slm-lm-eval --profile science --preset minicpm5-1b --experiment-name science-baseline
+profile: science
+claim: Better science knowledge
+tasks:
+  - sciq
+  - openbookqa
+  - arc_challenge
+num_fewshot: 0
+limit: 100
+seed: 42
+batch_size: auto
+device: auto
+dtype: bfloat16
+trust_remote_code: true
+output_dir: results/lm_eval

research/evals/pyproject.toml CHANGED Viewed

@@ -19,7 +19,7 @@ dependencies = [
 [project.optional-dependencies]
 lm-eval = [
-    "lm-eval[hf]>=0.4.9",
 ]
 [project.scripts]

 [project.optional-dependencies]
 lm-eval = [
+    "lm-eval[hf,ifeval]>=0.4.9",
 ]
 [project.scripts]

research/modal/README.md ADDED Viewed

	@@ -0,0 +1,774 @@

+# Modal finetune + benchmark
+GPU fine-tuning + benchmarking + Hub publishing on [Modal](https://modal.com/docs/guide) for `openbmb/MiniCPM5-1B`, wrapping existing [`research/finetune.py`](../finetune.py) and `slm-lm-eval`.
+Use this when you have no local CUDA but want a hackathon-quality
+**train → eval → gate → publish** loop for a whole **skill matrix** of QLoRA
+adapters (math, science, coding, reasoning, teaching, instructions).
+| Track | What you ship |
+| ----- | ------------- |
+| **Modal** | `modal run` skill-matrix pipeline, Volume artifacts, optional Modal Notebook |
+| **Well-Tuned** | Per-skill before/after `lm-eval` + gated Hub publish for each LoRA |
+---
+## Layout
+```text
+research/modal/
+├── _common.py         # Shared image, volumes, command builders, gate + publish helpers
+├── finetune_app.py    # One-shot batch pipeline (slm-finetune-benchmark): main, publish_only, pull
+├── server_app.py      # Long-lived GPU worker (slm-gpu-worker): GpuWorker.run_pipeline
+├── experiments.yaml   # Skill matrix: jobs, eval_profile, goals, publish
+├── README.md          # Full Modal docs (this file)
+└── SERVER.md          # Human + AI agent loop runbook (quick reference)
+```
+Interactive path: [`research/notebook/minicpm5-modal-finetune.ipynb`](../notebook/minicpm5-modal-finetune.ipynb) (Modal GPU Notebook).
+### Which app to use
+| App | CLI | Best for |
+| --- | --- | --- |
+| **`finetune_app.py`** | `modal run research/modal/finetune_app.py` | Full sweep, CI-style batch, parallel jobs |
+| **`server_app.py`** | `modal deploy` + `modal run research/modal/server_app.py` | Multi-hour session, iterative human/AI loops on **one warm GPU** |
+Both apps share [`_common.py`](_common.py): same image, `hf-cache` / `slm-finetune` volumes, and wrappers around [`research/finetune.py`](../finetune.py) + `slm-lm-eval`.
+---
+## One-time setup
+```bash
+# Modal CLI + auth
+pip install modal
+modal setup
+# HF token (downloads + Hub upload). Same token as huggingface-cli login.
+modal secret create huggingface HF_TOKEN=<your-hf-token>
+# Optional: validate deps before first image build
+uv sync --group finetune --group lm-eval --package slm-evals
+uv sync --group modal   # local orchestration only
+```
+`HF_TOKEN` must be a [write token](https://huggingface.co/settings/tokens) if you plan to push adapters to the Hub.
+---
+## Run training + benchmarks
+All commands from **repo root**. `finetune_app.py` runs the full **skill-matrix
+pipeline**: per-profile baseline lm-eval → finetune each job's QLoRA adapter →
+post-train lm-eval vs. that baseline → check `goals` (gate) → publish to the
+Hugging Face Hub if the gate passes → pull adapter + results to your laptop.
+```bash
+# Full sweep: every job in experiments.yaml
+modal run research/modal/finetune_app.py
+# One skill (cheap smoke run)
+modal run research/modal/finetune_app.py --job math-lora --max-steps 20
+# One category (e.g. all "science" jobs)
+modal run research/modal/finetune_app.py --category science
+# Re-run lm-eval (+ gate + publish) only — adapter already on Volume
+modal run research/modal/finetune_app.py --eval-only --job math-lora
+# Train + eval but skip the Hub push and the local download
+modal run research/modal/finetune_app.py --no-publish --no-pull
+# Train/eval jobs in parallel (one GPU per job — higher cost)
+modal run research/modal/finetune_app.py --parallel
+# Re-run just the gate + Hub publish for an already-evaluated job
+modal run research/modal/finetune_app.py::publish_only --job math-lora
+# Pull adapters + lm-eval results for a category without re-running anything
+modal run research/modal/finetune_app.py::pull --category math
+```
+Jobs live in [`experiments.yaml`](experiments.yaml) — a **skill matrix**, one
+QLoRA adapter per category, each evaluated against the matching
+`eval_profile` from [`research/evals/configs/eval_profiles.yaml`](../evals/configs/eval_profiles.yaml):
+| Job | Category | Dataset (format) | Eval profile | `goals` task | Publish |
+| --- | -------- | ----------------- | ------------ | ------------- | ------- |
+| `teaching-lora` | teaching | `research/data/education-lesson-chat.jsonl` (`chat`) | `instructions` | `ifeval` | ✅ |
+| `science-lora` | science | `research/data/science-tutor-chat.jsonl` (`chat`) | `science` | `sciq` (+ `arc_challenge` guard) | ✅ |
+| `math-lora` | math | `TIGER-Lab/MathInstruct` (`alpaca`) | `math` | `gsm8k` (+ `arc_challenge` guard) | ✅ |
+| `coding-lora` | coding | `iamtarun/python_code_instructions_18k_alpaca` (`alpaca`) | `code` | `mbpp` | ✅ |
+| `reasoning-lora` | reasoning | `HuggingFaceTB/smoltalk` (`chat`) | `reasoning` | `gsm8k` (+ `hellaswag` guard) | ✅ |
+| `alpaca-lora` | instructions | `tatsu-lab/alpaca` (`alpaca`) | `instructions` | — (no `goals`) | local-only |
+Before publishing, replace `defaults.hub_org` and each job's `publish.hub_repo`
+in `experiments.yaml` with your Hugging Face username/org (defaults to the
+placeholder `your-hf-username`).
+Edit `defaults.max_steps`, per-job `gpu`, or per-job `max_samples` /
+`dataset_split` in `experiments.yaml` to balance cost vs quality. See
+[Benchmark gate & Hugging Face Hub publish](#benchmark-gate--hugging-face-hub-publish)
+for the `goals`/`publish` schema.
+### CLI flags (`finetune_app.py`)
+`main` (default entrypoint — full pipeline):
+| Flag | Default | Meaning |
+| ---- | ------- | ------- |
+| `--train` / `--no-train` | train on | Run finetune jobs |
+| `--eval-only` | off | Skip train + baselines; eval existing Volume checkpoints |
+| `--parallel` | off | `finetune_one.spawn()` per job instead of sequential |
+| `--job` | all jobs | Run one job name from `experiments.yaml` |
+| `--category` | all categories | Run all jobs with this `category` |
+| `--max-steps` | from YAML | Override training steps |
+| `--publish` / `--no-publish` | publish on | Push to `publish.hub_repo` if the gate passes |
+| `--pull` / `--no-pull` | pull on | `modal volume get` the adapter + lm-eval results after each job |
+`publish_only` (separate entrypoint — `::publish_only`):
+| Flag | Default | Meaning |
+| ---- | ------- | ------- |
+| `--job` | required | Re-check the gate against existing results and publish if it passes |
+`pull` (separate entrypoint — `::pull`):
+| Flag | Default | Meaning |
+| ---- | ------- | ------- |
+| `--job` | — | Pull one job's adapter + results |
+| `--category` | — | Pull all jobs in a category |
+| `--dest` | `models/finetuned` | Local destination directory |
+---
+## GPU worker (`server_app.py`) — human + AI agent loops
+Use this when you want **one warm A10G container** for several hours and many train/eval commands **without** reinstalling deps or re-downloading HF weights each time.
+**Quick runbook:** see [`SERVER.md`](SERVER.md) (copy-paste commands for humans and coding agents).
+### Deploy once
+```bash
+modal deploy research/modal/server_app.py
+```
+App name: **`slm-gpu-worker`**. Dashboard: `modal app list` or the URL printed after deploy.
+`GpuWorker` keeps `min_containers=1` while deployed, mounts `hf-cache` + `slm-finetune`, and reuses the same container for sequential `.remote()` calls when possible.
+### Two-terminal loop (recommended)
+**Terminal 1 — keep worker alive** (default 4h; blocks unless detached):
+```bash
+modal run research/modal/server_app.py
+# or free your terminal:
+modal run -d research/modal/server_app.py --hours 6
+```
+**Terminal 2 — run experiments on the warm GPU** (repeat as often as you like):
+```bash
+# Full skill-matrix pipeline for one job on the warm container:
+# per-profile baseline → train → eval → gate → publish → pull
+modal run research/modal/server_app.py --job math-lora --max-steps 20
+# All jobs in a category
+modal run research/modal/server_app.py --category science
+# Whole matrix, but skip the Hub push
+modal run research/modal/server_app.py --pipeline --no-publish
+# Re-eval (+ gate + publish) an existing adapter on Volume
+modal run research/modal/server_app.py --eval-only --job math-lora
+# Re-check the gate and publish using already-computed results
+modal run research/modal/server_app.py --publish-only --job math-lora
+# Arbitrary command in /repo (same env as finetune.py)
+modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
+# Health check
+modal run research/modal/server_app.py --ping
+```
+Task flags (`--job`, `--category`, `--cmd`, `--pipeline`, `--eval-only`, `--publish-only`, `--ping`) automatically disable the default keep-alive mode.
+### CLI flags (`server_app.py`)
+| Flag | Default | Meaning |
+| ---- | ------- | ------- |
+| *(none)* | `serve=True` | Keep `GpuWorker` alive (`keep_alive`) |
+| `--hours` | `4` | Keep-alive duration |
+| `--no-serve` | — | Skip keep-alive (auto when any task flag is set) |
+| `--job` | — | Run the skill-matrix pipeline for one job |
+| `--category` | — | Run the skill-matrix pipeline for all jobs in a category |
+| `--pipeline` | off | Run the skill-matrix pipeline for all jobs |
+| `--max-steps` | from YAML | Override training steps |
+| `--eval-only` | off | Pipeline eval/gate/publish path only (skip baselines + train) |
+| `--publish` / `--no-publish` | publish on | Push to `publish.hub_repo` if the gate passes |
+| `--publish-only` | off | Re-check the gate against existing results and publish (requires `--job`) |
+| `--pull` / `--no-pull` | pull on | `modal volume get` adapter + results after the pipeline |
+| `--cmd` | — | Shell command (parsed with `shlex`) |
+| `--ping` | off | Return worker status JSON |
+### `GpuWorker` methods (for notebooks / Python callers)
+After `modal deploy`, call from Python:
+```python
+import modal
+Worker = modal.Cls.from_name("slm-gpu-worker", "GpuWorker")
+w = Worker()
+w.ping.remote()
+w.finetune.remote({"name": "math-lora", "dataset": "...", "format": "alpaca", "max_steps": 20})
+w.lm_eval.remote(experiment_name="math-lora__math", config="research/evals/configs/lm_eval_math.yaml", adapter_path="/vol/finetuned/math-lora")
+w.exec_cmd.remote(["uv", "run", "python", "research/finetune.py", "--help"])
+w.run_pipeline.remote(job_names=["math-lora"], max_steps=20)
+# Gate + publish (only pushes to the Hub if gate_result["passed"])
+gate = w.check_gate.remote(
+    candidate_results_path="/vol/finetuned/results/lm_eval/math-lora__math/results.json",
+    baseline_results_path="/vol/finetuned/results/lm_eval/minicpm5-1b__baseline__math/results.json",
+    goals={"task": "gsm8k", "min_score": 0.05, "min_improve": 0.02},
+)
+w.publish_adapter.remote(job=..., adapter_dir="/vol/finetuned/math-lora", gate_result=gate, ...)
+```
+Inside the class, `run_pipeline` chains `lm_eval` (baselines) → `finetune` → `lm_eval` (candidate) → `check_gate` → `publish_adapter` via `.local()`, so everything runs in the **same** container without extra cold starts.
+### Persistence (what survives between commands)
+| Layer | Survives | Notes |
+| ----- | -------- | ----- |
+| **Image** (`uv sync` baked in) | Across all runs | Rebuilds only when image definition changes |
+| **`hf-cache` Volume** | Across runs | Base weights + datasets; committed after each job |
+| **`slm-finetune` Volume** | Across runs | Adapters + lm-eval results |
+| **Warm container** | While deployed + idle &lt; `scaledown_window` | `min_containers=1`; max idle grace **3600s** (Modal limit) |
+| **`keep_alive` loop** | Up to `--hours` | Container stays active; no scale-down during loop |
+### Stop / logs
+```bash
+modal app logs slm-gpu-worker -f          # stream logs
+modal app stop slm-gpu-worker             # stop deployed app + warm pool
+modal app stop slm-gpu-worker -y          # no confirmation prompt
+```
+Refs: [`modal app`](https://modal.com/docs/reference/cli/app) · [`modal run`](https://modal.com/docs/reference/cli/run) · [`modal shell`](https://modal.com/docs/reference/cli/shell)
+### Agent loop pattern
+For an AI agent iterating on finetune hyperparameters or eval configs:
+1. Ensure worker is up: `modal run research/modal/server_app.py --ping` → `{"status": "ok"}`.
+2. If ping fails, human or agent runs `modal deploy research/modal/server_app.py` then `modal run -d research/modal/server_app.py --hours 6`.
+3. Agent runs smoke train+eval+gate (no publish yet): `--job math-lora --max-steps 5 --no-publish`.
+4. Agent re-evals without retraining: `--eval-only --job math-lora`.
+5. Agent reads results: `modal volume get slm-finetune results/lm_eval/math-lora__math ./results/lm_eval/math-lora__math` or `modal volume ls slm-finetune`.
+6. Agent adjusts `experiments.yaml`'s `goals`/`max_steps`/`max_samples`, repeats from step 3.
+7. Once the gate passes and `hub_org`/`hub_repo` are real: `--publish-only --job math-lora`, or just drop `--no-publish`.
+8. When done: `modal app stop slm-gpu-worker` (optional, stops GPU billing from warm pool).
+See [`SERVER.md`](SERVER.md) for a structured checklist and error recovery table.
+---
+## What gets saved on Modal
+Modal persists artifacts on [**Volumes**](https://modal.com/docs/guide/volumes) — a distributed filesystem optimized for write-once, read-many workloads like model checkpoints. Files written only to the container disk (outside the mount path) are **not** saved.
+| Volume | Mount in container | Contents |
+| ------ | ------------------ | -------- |
+| `slm-finetune` | `/vol/finetuned` | LoRA adapters, `training_results.json`, lm-eval `results/` |
+| `hf-cache` | `/root/.cache/huggingface` | Cached base weights + datasets |
+Volumes are created lazily on first run (`create_if_missing=True` in [`finetune_app.py`](finetune_app.py)).
+### Commits and visibility
+Per the [Volumes guide](https://modal.com/docs/guide/volumes):
+- **`volume.commit()`** — persist writes so other containers and `modal volume get` can see them. Our workers call this after each train/eval job.
+- **Background commits** — Modal also snapshots attached Volumes every few seconds and on container shutdown, but explicit `commit()` is safest before download.
+- **`volume.reload()`** — needed only if the *same* container must see writes from another container without restarting. Each `finetune_one.remote()` / `run_lm_eval.remote()` starts fresh and mounts the latest committed state.
+Training writes under `/vol/finetuned/...` (the mount), not `/repo/models/...`. That matches Modal’s [model checkpointing](https://modal.com/docs/guide/volumes#model-checkpointing) pattern: point `finetune.py --out` at the Volume path.
+### Per-job adapter layout
+Each finetune job writes to a Volume path named after the job (e.g. `math-lora/`).
+lm-eval results live under `results/lm_eval/`, named
+`<job_name>__<eval_profile>` for candidates and `<preset>__baseline__<eval_profile>`
+for the shared per-profile baselines:
+```text
+slm-finetune (Volume)
+├── math-lora/
+│   ├── adapter_config.json
+│   ├── adapter_model.safetensors   # or adapter_model.bin
+│   ├── tokenizer files…
+│   ├── training_results.json
+│   └── README.md                   # model card, written by publish_adapter
+├── science-lora/
+├── coding-lora/
+├── reasoning-lora/
+├── teaching-lora/
+├── alpaca-lora/
+└── results/lm_eval/
+    ├── minicpm5-1b__baseline__math/        # shared by all "math" profile jobs
+    ├── minicpm5-1b__baseline__science/
+    ├── minicpm5-1b__baseline__instructions/
+    ├── math-lora__math/
+    ├── science-lora__science/
+    └── ...
+```
+Because `eval_profile` is shared across jobs (e.g. `teaching-lora` and
+`alpaca-lora` both use `instructions`), the `instructions` baseline is computed
+once per pipeline run and reused for both jobs' gates.
+---
+## Volume CLI (browse, download, upload)
+Official reference: [Modal Volumes guide](https://modal.com/docs/guide/volumes) · [CLI reference](https://modal.com/docs/reference/cli/volume)
+### Create or list volumes
+```bash
+modal volume list
+modal volume create slm-finetune    # optional; app creates on first run
+modal volume ls slm-finetune
+modal volume ls slm-finetune lesson-lora
+```
+### Browse in a shell
+Volumes are mounted under `/mnt` in an interactive shell:
+```bash
+modal shell --volume slm-finetune
+# inside shell:
+ls /mnt/slm-finetune
+ls /mnt/slm-finetune/lesson-lora
+du -sh /mnt/slm-finetune/lesson-lora
+```
+Use `du` for size — Volumes do not report accurate `df` / `disk_usage()` values ([docs](https://modal.com/docs/guide/volumes#disk-usage-reporting)).
+### Download LoRA to your machine
+**Use the CLI for adapter weights.** The Modal web UI only supports downloads up to **16 MB** per file; `adapter_model.safetensors` is usually larger ([docs](https://modal.com/docs/guide/volumes#downloading-a-file-from-a-volume)).
+```bash
+mkdir -p ./models/finetuned
+# One job folder → local path expected by models.yaml
+modal volume get slm-finetune lesson-lora ./models/finetuned/minicpm5-1b-lora
+# lm-eval artifacts
+mkdir -p ./results
+modal volume get slm-finetune results/lm_eval ./results/lm_eval
+# Entire volume (large)
+modal volume get slm-finetune / ./modal-artifacts
+```
+Job folders use the **job name** from `experiments.yaml` (`lesson-lora`), not `minicpm5-1b-lora`. Root [`models.yaml`](../../models.yaml) preset `minicpm5-1b-lesson-lora` expects `./models/finetuned/minicpm5-1b-lora`.
+If you downloaded to a different folder name:
+```bash
+modal volume get slm-finetune lesson-lora ./models/finetuned/lesson-lora
+cp -r ./models/finetuned/lesson-lora ./models/finetuned/minicpm5-1b-lora
+```
+### Upload to a Volume from local
+Push a local adapter or merged checkpoint back to Modal ([`modal volume put`](https://modal.com/docs/reference/cli/volume)):
+```bash
+modal volume put slm-finetune ./models/finetuned/minicpm5-1b-lora lesson-lora
+```
+Or from Python ([`batch_upload`](https://modal.com/docs/guide/volumes#using-a-volume-from-local-code)):
+```python
+import modal
+vol = modal.Volume.from_name("slm-finetune")
+with vol.batch_upload() as batch:
+    batch.put_directory(
+        "./models/finetuned/minicpm5-1b-lora",
+        "/lesson-lora",
+    )
+```
+### Copy within a Volume
+```bash
+modal volume cp slm-finetune lesson-lora lesson-lora-backup
+```
+### Parallel training note
+With `--parallel`, multiple jobs write to **different** folders on the same Volume. On Volumes v1, avoid more than ~5 concurrent writers/commits ([docs](https://modal.com/docs/guide/volumes#volume-commits-and-reloads)). Prefer sequential runs unless you use Volumes v2 (`modal volume create --version=2`).
+---
+## Use downloaded weights locally
+```bash
+# Gradio / inference preset
+export ACTIVE_MODEL=minicpm5-1b-lesson-lora
+uv run --package gradio-space python -m gradio_space.app
+# lm-eval on downloaded adapter
+uv run --package slm-evals slm-lm-eval \
+  --config research/evals/configs/lm_eval_smoke.yaml \
+  --preset minicpm5-1b-lesson-lora \
+  --experiment-name minicpm5-1b-lora__local-check
+```
+### Optional: merge LoRA into full weights locally
+Adapters are small; merged weights are easier for some deploy targets.
+```bash
+uv run python research/finetune.py \
+  --merge ./models/finetuned/minicpm5-1b-lora \
+  --out ./models/finetuned/minicpm5-1b-lora-merged
+```
+Then use preset `minicpm5-1b-lesson-merged` or `--model ./models/finetuned/minicpm5-1b-lora-merged`.
+---
+## Benchmark gate & Hugging Face Hub publish
+`finetune_app.py` / `server_app.py` publish adapters to the Hub **automatically**,
+but only when a job's lm-eval results pass its `goals`. This is the
+"only ship it if it's actually better" gate.
+### `goals` schema (per job in `experiments.yaml`)
+```yaml
+goals:
+  task: gsm8k          # lm-eval task name, scored via primary_metric() (same as summary.md)
+  min_score: 0.05      # candidate score must be >= this
+  min_improve: 0.02    # candidate - baseline must be >= this (baseline = per-profile baseline run)
+  guard_tasks:          # optional regression guards — must NOT regress more than max_regress
+    - task: arc_challenge
+      max_regress: 0.03
+```
+A job with no `goals` (e.g. `alpaca-lora`) is never gated and never published —
+it's local-only (still trained, evaluated, and pulled to your laptop).
+### `publish` schema (per job)
+```yaml
+publish:
+  hub_repo: your-hf-username/minicpm5-1b-math-lora
+  private: false  # public so judges can verify the Well-Tuned badge; set true to keep it hidden
+```
+### What happens on a passing gate
+1. `run_lm_eval` writes `results/lm_eval/<job>__<profile>/results.json`.
+2. `check_gate` compares it against `results/lm_eval/<preset>__baseline__<profile>/results.json`
+   using the `goals` above → `{"passed": bool, "checks": [...]}`.
+3. If `passed` and `publish` is set, `publish_adapter`:
+   - renders a model card (`README.md`) into the adapter directory — base model,
+     gate checks table, full lm-eval baseline-vs-candidate-vs-delta table,
+     training stats, and a PEFT load snippet
+   - `huggingface_hub.HfApi().create_repo(..., exist_ok=True)` +
+     `upload_folder(...)` to `publish.hub_repo`
+If the gate fails, nothing is pushed — rerun with different `max_steps` /
+dataset / `goals`, then `modal run research/modal/finetune_app.py::publish_only --job <name>`
+once it passes (re-checks the gate against the latest results before publishing).
+### Setup
+```bash
+huggingface-cli login
+# or: export HF_TOKEN=hf_...   (needs write access; same token as `modal secret create huggingface`)
+```
+Set real values for `defaults.hub_org` and each job's `publish.hub_repo` in
+`experiments.yaml` before running with `--publish` (the default). Repos are
+created automatically (`exist_ok=True`) — no need to pre-create them on huggingface.co.
+---
+## Manual Hugging Face Hub publish (fallback)
+Use this if you'd rather download an adapter and push it yourself — e.g. for
+**merged full weights**, or adapters trained before the gate/publish pipeline
+existed.
+### Prerequisites
+```bash
+huggingface-cli login
+# or: export HF_TOKEN=hf_...
+```
+Create an empty model repo on Hugging Face (e.g. `your-user/minicpm5-1b-lesson-lora`).
+### Option A — Upload LoRA adapter (recommended)
+After `modal volume get`:
+```bash
+ADAPTER=./models/finetuned/minicpm5-1b-lora
+REPO=your-user/minicpm5-1b-lesson-lora
+huggingface-cli upload "$REPO" "$ADAPTER" . \
+  --repo-type model \
+  --commit-message "Lesson LoRA from Modal finetune"
+```
+Add a minimal `README.md` in the adapter folder before upload (or edit on the Hub) documenting the base model:
+```markdown
+# MiniCPM5-1B lesson LoRA
+- Base model: [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B)
+- Dataset: education lesson chat (Build Small hackathon)
+- Load with PEFT: `PeftModel.from_pretrained(base, "your-user/minicpm5-1b-lesson-lora")`
+```
+**Load from Hub in Python:**
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer
+base = "openbmb/MiniCPM5-1B"
+adapter = "your-user/minicpm5-1b-lesson-lora"
+tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    base, torch_dtype="auto", device_map="auto", trust_remote_code=True
+)
+model = PeftModel.from_pretrained(model, adapter)
+```
+### Option B — Upload merged weights
+```bash
+uv run python research/finetune.py \
+  --merge ./models/finetuned/minicpm5-1b-lora \
+  --out ./models/finetuned/minicpm5-1b-lora-merged
+huggingface-cli upload your-user/minicpm5-1b-lesson-merged \
+  ./models/finetuned/minicpm5-1b-lora-merged . \
+  --repo-type model
+```
+Consumers set `MODEL_ID=your-user/minicpm5-1b-lesson-merged` with no adapter.
+### Option C — Upload from Modal shell (no local download)
+Browse the Volume in a shell ([docs](https://modal.com/docs/guide/volumes#using-a-volume-from-outside-of-modal)):
+```bash
+modal shell --volume slm-finetune
+```
+Inside the shell (volume at `/mnt/slm-finetune`):
+```bash
+pip install huggingface_hub
+export HF_TOKEN=...   # write token
+huggingface-cli upload your-user/minicpm5-1b-lesson-lora \
+  /mnt/slm-finetune/lesson-lora . --repo-type model
+```
+Downloading to your laptop first (Option A) is usually easier to review before publish.
+### Use on Hugging Face Space
+**LoRA on Space (Gradio SDK):**
+1. Upload adapter repo (Option A).
+2. In Space **Settings → Repository secrets**, set `HF_TOKEN` if the base model needs it.
+3. In Space env vars:
+```bash
+ACTIVE_MODEL=minicpm5-1b
+# Override adapter via custom preset or env — e.g. add to models.yaml on Space:
+# adapter_path: your-user/minicpm5-1b-lesson-lora  # Hub id works if peft resolves it
+```
+For the shipped Space, the reliable path is: download adapter → commit into repo under `models/finetuned/` → `ACTIVE_MODEL=minicpm5-1b-lesson-lora`, or upload **merged** weights and point `MODEL_ID` at your Hub repo.
+**Merged on Space:**
+```bash
+ACTIVE_MODEL=custom
+MODEL_ID=your-user/minicpm5-1b-lesson-merged
+TRUST_REMOTE_CODE=true
+```
+---
+## Modal Notebooks (interactive GPU)
+Official guide: [Modal Notebooks](https://modal.com/docs/guide/notebooks)
+Use a hosted Jupyter kernel on Modal for demos, pair programming, and quick experiments. For reproducible sweeps and CI-style runs, prefer `modal run research/modal/finetune_app.py`.
+### Getting started
+1. Open [modal.com/notebooks](https://modal.com/notebooks) and **upload** [`research/notebook/minicpm5-modal-finetune.ipynb`](../notebook/minicpm5-modal-finetune.ipynb) (or create a notebook and copy the cells).
+2. In the **sidebar → Compute profile**, enable a **GPU** (e.g. A10G). Notebooks are serverless: you pay only while the kernel runs; idle shutdown defaults to 10 minutes.
+3. Attach resources in the sidebar **Files** panel:
+   - **Volume** `slm-finetune` → appears under `/mnt/slm-finetune` (share checkpoints with `modal run` jobs)
+   - **Secret** `huggingface` → injects `HF_TOKEN` for Hub downloads
+4. Run cells top to bottom.
+The default notebook image includes PyTorch, Transformers, and NumPy. Install extras with:
+```python
+%uv pip install uv peft bitsandbytes datasets
+```
+### Persist checkpoints on a Volume
+The container filesystem is **ephemeral**. Anything under `/root` is lost when the kernel stops. Write adapters to an attached Volume:
+```python
+OUT = "/mnt/slm-finetune/lesson-lora-notebook"  # survives kernel restarts
+```
+After training, download from the **Files** panel (⬇) or locally:
+```bash
+modal volume get slm-finetune lesson-lora-notebook ./models/finetuned/minicpm5-1b-lora
+```
+### Custom image (optional, full repo deps)
+To match the `modal run` environment exactly, deploy the app image once:
+```bash
+modal deploy research/modal/finetune_app.py
+```
+Then in the notebook sidebar, search for function `finetune_one` from app `slm-finetune-benchmark` and select that image as the kernel.
+Or call deployed functions from a cell with [`%modal` magic](https://modal.com/docs/guide/notebooks#cell-magic):
+```python
+%modal from slm-finetune-benchmark import finetune_one
+finetune_one.remote({
+    "name": "lesson-lora",
+    "dataset": "research/data/education-lesson-chat.jsonl",
+    "format": "chat",
+    "max_steps": 20,
+})
+```
+(Requires `modal deploy` and the repo baked into the image.)
+### Share for hackathon judges
+Use **Share** in the notebook editor → **public unlisted link** → **Can view and run** so reviewers can fork and execute without a Modal account ([docs](https://modal.com/docs/guide/notebooks#access-and-sharing)).
+### Notebook vs `modal run`
+| | Modal Notebook | `modal run finetune_app.py` |
+| --- | --- | --- |
+| Best for | Demo video, exploration | Reproducible sweep, Volume + lm-eval pipeline |
+| GPU | Sidebar compute profile | `gpu="A10G"` on functions |
+| Persistence | Attach Volume in sidebar | `slm-finetune` Volume auto-mounted |
+| Cost | Per kernel uptime | Per function invocation |
+---
+## Architecture
+```mermaid
+flowchart LR
+  subgraph batch [finetune_app.py — batch]
+    laptop1["modal run finetune_app\n--job/--category"] --> base["run_lm_eval\n(per-profile baseline)"]
+    laptop1 --> train["finetune_one"]
+    train --> eval["run_lm_eval\n(candidate)"]
+    eval --> gate["check_gate\n(goals)"]
+    gate -- passed --> pub["publish_adapter"]
+  end
+  subgraph worker [server_app.py — warm loop]
+    laptop2["modal run server_app\n--job/--category/--pipeline"] --> gpu["GpuWorker A10G"]
+    gpu --> rp["run_pipeline\n(baseline -> train -> eval -> gate -> publish)"]
+  end
+  base --> vol["Volume slm-finetune"]
+  train --> vol
+  eval --> vol
+  gate --> vol
+  rp --> vol
+  gpu --> hfc["Volume hf-cache"]
+  pub --> hub["Hugging Face Hub\n(publish.hub_repo)"]
+  rp --> hub
+  vol --> get["modal volume get\n(pull)"]
+  get --> local["models/finetuned/<job>"]
+  local --> space["HF Space ACTIVE_MODEL"]
+```
+| Resource | Role |
+| -------- | ---- |
+| App `slm-finetune-benchmark` | One-shot batch pipeline (`finetune_app.py`): `main`, `publish_only`, `pull` |
+| App `slm-gpu-worker` | Long-lived GPU worker (`server_app.py`): `GpuWorker.run_pipeline` |
+| GPU `A10G` (or per-job `gpu:` override) | Default for train + eval |
+| Secret `huggingface` | `HF_TOKEN` for HF downloads + Hub publish |
+| [`_common.py`](_common.py) | Shared image, volumes, command builders, gate (`evaluate_gate`/`check_gate_files`), publish (`publish_adapter_files`, `render_model_card`) |
+| [`experiments.yaml`](experiments.yaml) | Skill matrix: jobs, `eval_profile`, `goals`, `publish` |
+| [`eval_profiles.yaml`](../evals/configs/eval_profiles.yaml) | Maps `eval_profile` → lm-eval config + task list |
+| [`finetune.py`](../finetune.py) | Training logic (unchanged) |
+| `slm-lm-eval` | Academic benchmarks |
+---
+## Troubleshooting
+| Symptom | Fix |
+| ------- | --- |
+| `Secret huggingface not found` | `modal secret create huggingface HF_TOKEN=...` |
+| Volume empty after run | Job may have failed; `modal volume ls slm-finetune`; ensure writes went to `/vol/finetuned` not `/repo` |
+| `modal volume get` missing files | Call `commit()` completed; for same-container reads use `volume.reload()` |
+| Large file won't download in UI | Use `modal volume get` CLI (16 MB UI limit) |
+| `modal volume get` path wrong | Job name = top-level folder (e.g. `math-lora`, not `minicpm5-1b-lora`) |
+| Gate fails / `published: false, reason: "gate failed"` | Check `gate.checks` in the output; adjust `goals` (`min_score`/`min_improve`/`guard_tasks`), `max_steps`, or dataset, then rerun |
+| `published: false, reason: "no publish config..."` | Job has no `publish:` block in `experiments.yaml` (intentional for local-only jobs like `alpaca-lora`) |
+| `Unknown eval_profile ...` | Check `eval_profile` in `experiments.yaml` matches a key in `research/evals/configs/eval_profiles.yaml` |
+| Hub upload 403 | Use a write `HF_TOKEN`; repos are created automatically (`exist_ok=True`), no need to pre-create |
+| Still publishing to `your-hf-username/...` | Edit `defaults.hub_org` and each job's `publish.hub_repo` in `experiments.yaml` |
+| Space cannot find adapter | Use merged weights or copy adapter into repo `models/finetuned/` |
+| Image build slow | `hf-cache` Volume caches weights across runs |
+| OOM on GPU | `--mode qlora` in `experiments.yaml`; lower `max_len` in finetune; or set a per-job `gpu:` with more VRAM |
+| `scaledown_window` deploy error | Must be 2–3600s (we use 3600); see `_common.py` |
+| `server_app` ping fails | `modal deploy research/modal/server_app.py`; start keep-alive: `modal run -d research/modal/server_app.py` |
+| Jobs hit different containers | Deploy first; use `server_app.py` not `finetune_app.py` for warm loop |
+| Worker still billing after done | `modal app stop slm-gpu-worker` |
+---
+## Hackathon checklist
+1. Link or screenshot of Modal app run (`slm-finetune-benchmark` or `slm-gpu-worker`), including the `--- summary ---` table (skill, category, gate, published, hub_repo).
+2. `results/lm_eval/<job>__<profile>/comparison.md` — baseline vs candidate per skill.
+3. At least one adapter with `goals` that passed the gate and published to the Hub (model card auto-generated).
+4. Adapter on Volume or Hub + `ACTIVE_MODEL=minicpm5-1b-<skill>-lora` on Space.
+5. Optional: Notebook recording of smoke train cell.
+See also: [`SERVER.md`](SERVER.md) · [research/USAGE.md](../USAGE.md) · [Modal Volumes](https://modal.com/docs/guide/volumes) · [Modal Notebooks](https://modal.com/docs/guide/notebooks) · [Modal CUDA](https://modal.com/docs/guide/cuda)

research/modal/SERVER.md ADDED Viewed

	@@ -0,0 +1,204 @@

+# GPU worker runbook (`server_app.py`)
+Long-lived Modal GPU for iterative finetune / eval loops. Intended for **humans** and **AI coding agents** running many experiments from the same warm container.
+**Full docs:** [README.md](README.md) · **Code:** [`server_app.py`](server_app.py) · **Jobs:** [`experiments.yaml`](experiments.yaml)
+---
+## Prerequisites
+Run from **repo root**.
+```bash
+pip install modal
+modal setup
+modal secret create huggingface HF_TOKEN=<your-hf-token>   # once
+modal deploy research/modal/server_app.py                 # once per image change
+```
+| Name | Value |
+| ---- | ----- |
+| App | `slm-gpu-worker` |
+| Class | `GpuWorker` |
+| GPU | `A10G` |
+| Volumes | `hf-cache` → `/root/.cache/huggingface`, `slm-finetune` → `/vol/finetuned` |
+---
+## Start session (human or agent)
+```bash
+# Option A: block terminal (default 4h keep-alive)
+modal run research/modal/server_app.py
+# Option B: detached — preferred for agent loops
+modal run -d research/modal/server_app.py --hours 6
+# Verify worker
+modal run research/modal/server_app.py --ping
+# → {"status": "ok", "app": "slm-gpu-worker"}
+```
+---
+## Experiment commands (repeat freely)
+All commands use the deployed warm worker when `modal deploy` has been run.
+```bash
+# --- Train ---
+modal run research/modal/server_app.py --job lesson-lora --max-steps 20
+modal run research/modal/server_app.py --job alpaca-lora --max-steps 50
+modal run research/modal/server_app.py --job smoltalk-lora --max-steps 50
+# --- Eval only (adapter must exist on Volume) ---
+modal run research/modal/server_app.py --eval-only --job lesson-lora
+modal run research/modal/server_app.py --eval-only   # all jobs in experiments.yaml
+# --- Full pipeline (same container: baseline → train → eval) ---
+modal run research/modal/server_app.py --pipeline --job lesson-lora --max-steps 20
+modal run research/modal/server_app.py --pipeline --job lesson-lora --max-steps 20 --skip-baseline
+# --- Custom finetune.py flags ---
+modal run research/modal/server_app.py --cmd \
+  "uv run python research/finetune.py --preset minicpm5-1b --mode lora \
+   --dataset research/data/education-lesson-chat.jsonl --format chat \
+   --out /vol/finetuned/lesson-lora --max_steps 10"
+# --- Custom lm-eval ---
+modal run research/modal/server_app.py --cmd \
+  "uv run --package slm-evals slm-lm-eval \
+   --config research/evals/configs/lm_eval_smoke.yaml \
+   --experiment-name lesson-lora__manual \
+   --output-dir /vol/finetuned/results/lm_eval \
+   --model openbmb/MiniCPM5-1B \
+   --adapter /vol/finetuned/lesson-lora"
+```
+Job names and datasets: [`experiments.yaml`](experiments.yaml).
+---
+## Inspect results (human or agent)
+```bash
+# List Volume
+modal volume ls slm-finetune
+modal volume ls slm-finetune lesson-lora
+modal volume ls slm-finetune results/lm_eval
+# Download to laptop
+modal volume get slm-finetune lesson-lora ./models/finetuned/minicpm5-1b-lora
+modal volume get slm-finetune results/lm_eval ./results/lm_eval
+# Stream worker logs
+modal app logs slm-gpu-worker -f
+```
+Key artifacts on Volume:
+| Path | Content |
+| ---- | ------- |
+| `/vol/finetuned/<job>/` | LoRA adapter + `training_results.json` |
+| `/vol/finetuned/results/lm_eval/<exp>/` | `results.json`, `summary.md`, `comparison.md` |
+---
+## End session
+```bash
+modal app stop slm-gpu-worker -y
+```
+Stops the deployed app and warm GPU pool. Volume data is retained.
+---
+## AI agent loop (structured)
+Use this sequence when an agent is iterating on training or eval without local CUDA.
+```
+1. CHECK   modal run research/modal/server_app.py --ping
+2. BOOT    if ping fails → modal deploy ... then modal run -d ... --hours 6
+3. SMOKE   modal run ... --job lesson-lora --max-steps 5
+4. EVAL    modal run ... --eval-only --job lesson-lora
+5. READ    modal volume ls slm-finetune results/lm_eval
+           modal volume get ... (or read comparison.md locally after get)
+6. ADJUST  edit experiments.yaml OR pass --max-steps / --lm-eval-config
+7. GOTO 3  until metrics acceptable
+8. PULL    modal volume get slm-finetune lesson-lora ./models/finetuned/minicpm5-1b-lora
+9. STOP    modal app stop slm-gpu-worker -y   (optional, saves GPU cost)
+```
+### Agent decision rules
+| Situation | Action |
+| --------- | ------ |
+| First time in repo | `modal deploy research/modal/server_app.py` |
+| `ping` returns ok | Skip boot; run task commands |
+| `ping` fails / timeout | `modal run -d research/modal/server_app.py --hours 6`, retry ping |
+| Train OOM | `--cmd` with `--mode qlora` or lower `--max-steps` |
+| Eval missing adapter | Train first, or `modal volume ls slm-finetune <job>` |
+| Need batch parallel GPUs | Use `finetune_app.py --parallel` instead |
+| Need one-shot CI sweep | Use `finetune_app.py` (not server) |
+| Image / code changed | Re-run `modal deploy research/modal/server_app.py` |
+### Python API (agents in Modal notebook or scripts)
+```python
+import modal
+Worker = modal.Cls.from_name("slm-gpu-worker", "GpuWorker")
+w = Worker()
+assert w.ping.remote()["status"] == "ok"
+w.finetune.remote({
+    "name": "lesson-lora",
+    "preset": "minicpm5-1b",
+    "mode": "lora",
+    "dataset": "research/data/education-lesson-chat.jsonl",
+    "format": "chat",
+    "max_steps": 20,
+})
+w.run_pipeline.remote(job_names=["lesson-lora"], max_steps=20)
+```
+---
+## `finetune_app.py` vs `server_app.py`
+| | `finetune_app.py` | `server_app.py` |
+| --- | --- | --- |
+| App name | `slm-finetune-benchmark` | `slm-gpu-worker` |
+| Container | New per function call | Warm pool, reused |
+| Deploy | Optional | **Required** for cross-terminal reuse |
+| Parallel jobs | `--parallel` (3 GPUs) | Sequential on one GPU |
+| Best for | Full sweep, reproducible batch | Interactive / agent iteration |
+| Entry | `modal run research/modal/finetune_app.py` | `modal deploy` + `modal run research/modal/server_app.py` |
+---
+## Troubleshooting
+| Symptom | Fix |
+| ------- | --- |
+| `scaledown_window must be between 2 and 3600` | Already fixed in `_common.py` (3600 max) |
+| Deploy succeeds but ping fails | Wait ~30s for warm pool; check `modal app list` |
+| Command uses cold container | Run `modal deploy` first; confirm app name `slm-gpu-worker` |
+| HF download every run | `hf-cache` volume should mount; first run populates cache |
+| Writes not visible | Paths must be under `/vol/finetuned/`, not `/repo/models/` |
+| GPU still billing overnight | `modal app stop slm-gpu-worker` |
+---
+## References
+- [Modal Volumes](https://modal.com/docs/guide/volumes)
+- [Modal Images](https://modal.com/docs/guide/images)
+- [modal run](https://modal.com/docs/reference/cli/run)
+- [modal app stop](https://modal.com/docs/reference/cli/app#modal-app-stop)
+- [modal shell](https://modal.com/docs/reference/cli/shell) — debug: `modal shell research/modal/server_app.py::GpuWorker.finetune`

research/modal/_common.py ADDED Viewed

	@@ -0,0 +1,520 @@

+"""Shared Modal image, volumes, and command builders for finetune + server apps."""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+from typing import Any
+import modal
+import yaml
+_file = Path(__file__).resolve()
+try:
+    LOCAL_REPO_ROOT = _file.parents[2]
+except IndexError:
+    LOCAL_REPO_ROOT = Path("/repo")
+if (_file.parent / "experiments.yaml").is_file():
+    EXPERIMENTS_PATH = _file.parent / "experiments.yaml"
+else:
+    EXPERIMENTS_PATH = Path("/repo/research/modal/experiments.yaml")
+_EVAL_PROFILES_REL = "research/evals/configs/eval_profiles.yaml"
+if (LOCAL_REPO_ROOT / _EVAL_PROFILES_REL).is_file():
+    EVAL_PROFILES_PATH = LOCAL_REPO_ROOT / _EVAL_PROFILES_REL
+else:
+    EVAL_PROFILES_PATH = Path("/repo") / _EVAL_PROFILES_REL
+REPO_ROOT = LOCAL_REPO_ROOT
+HF_CACHE_PATH = "/root/.cache/huggingface"
+FINETUNE_VOL_PATH = "/vol/finetuned"
+LM_EVAL_OUTPUT = f"{FINETUNE_VOL_PATH}/results/lm_eval"
+BASE_MODEL_ID = "openbmb/MiniCPM5-1B"
+BASELINE_EXPERIMENT = "minicpm5-1b__modal-baseline"
+BASELINE_RESULTS_JSON = f"{LM_EVAL_OUTPUT}/{BASELINE_EXPERIMENT}/results.json"
+# Metric keys to prefer when picking a task's "primary" score, in priority
+# order. Covers lm-eval-harness multiple-choice (acc), generation (exact_match),
+# and code (pass@1) tasks so gates and model cards pick a real score, not a stderr.
+_METRIC_PRIORITY = (
+    "acc,none",
+    "acc_norm,none",
+    "exact_match,strict-match",
+    "exact_match,flexible-extract",
+    "pass_at_1,create_test",
+    "pass_at_1,none",
+    "f1,none",
+    "bleu,none",
+)
+hf_cache_vol = modal.Volume.from_name("hf-cache", create_if_missing=True)
+finetune_vol = modal.Volume.from_name("slm-finetune", create_if_missing=True)
+hf_secret = modal.Secret.from_name("huggingface")
+image = (
+    modal.Image.debian_slim(python_version="3.12")
+    .apt_install("git", "build-essential")
+    .pip_install("uv", "pyyaml", "huggingface_hub")
+    .add_local_dir(
+        str(REPO_ROOT),
+        remote_path="/repo",
+        copy=True,
+        ignore=[
+            ".git/**",
+            ".venv/**",
+            "models/**",
+            "results/**",
+            "outputs/**",
+            "**/__pycache__/**",
+            "**/.pytest_cache/**",
+            "**/node_modules/**",
+        ],
+    )
+    .run_commands(
+        "cd /repo && uv sync --frozen --group finetune --group lm-eval --no-dev"
+    )
+)
+COMMON_ENV = {
+    "TRUST_REMOTE_CODE": "true",
+    "HF_HOME": HF_CACHE_PATH,
+    "PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True",
+}
+DEFAULT_GPU = "A10G"
+DEFAULT_KEEPALIVE_HOURS = 4.0
+DEFAULT_SCALEDOWN_WINDOW = 3600  # max allowed by Modal (1h idle before scale-down)
+DEFAULT_WORKER_TIMEOUT = 14400  # 4h per method call
+def repo_env() -> dict[str, str]:
+    return {**os.environ, **COMMON_ENV}
+def reload_volumes() -> None:
+    finetune_vol.reload()
+    hf_cache_vol.reload()
+def commit_volumes() -> None:
+    finetune_vol.commit()
+    hf_cache_vol.commit()
+def load_experiments() -> dict[str, Any]:
+    with EXPERIMENTS_PATH.open() as f:
+        return yaml.safe_load(f) or {}
+def apply_defaults(job: dict[str, Any], defaults: dict[str, Any]) -> dict[str, Any]:
+    return {**defaults, **job}
+def build_finetune_cmd(job: dict[str, Any], out_dir: str) -> list[str]:
+    cmd = [
+        "uv",
+        "run",
+        "python",
+        "research/finetune.py",
+        "--preset",
+        job.get("preset", "minicpm5-1b"),
+        "--mode",
+        job.get("mode", "lora"),
+        "--dataset",
+        job["dataset"],
+        "--format",
+        job["format"],
+        "--out",
+        out_dir,
+    ]
+    if job.get("max_steps") is not None:
+        cmd.extend(["--max_steps", str(int(job["max_steps"]))])
+    if job.get("epochs") is not None:
+        cmd.extend(["--epochs", str(job["epochs"])])
+    if job.get("dataset_config"):
+        cmd.extend(["--dataset-config", job["dataset_config"]])
+    if job.get("dataset_split"):
+        cmd.extend(["--dataset-split", str(job["dataset_split"])])
+    if job.get("max_samples") is not None:
+        cmd.extend(["--dataset-max-samples", str(int(job["max_samples"]))])
+    return cmd
+def build_lm_eval_cmd(
+    *,
+    experiment_name: str,
+    config: str,
+    preset: str | None = None,
+    model_path: str | None = None,
+    adapter_path: str | None = None,
+    compare_to: str | None = None,
+) -> list[str]:
+    cmd = [
+        "uv",
+        "run",
+        "--package",
+        "slm-evals",
+        "slm-lm-eval",
+        "--config",
+        config,
+        "--experiment-name",
+        experiment_name,
+        "--output-dir",
+        LM_EVAL_OUTPUT,
+    ]
+    if preset:
+        cmd.extend(["--preset", preset])
+    if model_path:
+        cmd.extend(["--model", model_path])
+    if adapter_path:
+        cmd.extend(["--adapter", adapter_path])
+    if compare_to:
+        cmd.extend(["--compare-to", compare_to])
+    return cmd
+def prepare_jobs(
+    *,
+    job: str | None = None,
+    category: str | None = None,
+    max_steps: int | None = None,
+) -> tuple[dict[str, Any], list[dict[str, Any]]]:
+    spec = load_experiments()
+    defaults = spec.get("defaults", {})
+    jobs = spec.get("finetune", [])
+    if job:
+        jobs = [j for j in jobs if j.get("name") == job]
+        if not jobs:
+            raise SystemExit(
+                f"Unknown job {job!r}; check research/modal/experiments.yaml"
+            )
+    if category:
+        jobs = [j for j in jobs if j.get("category") == category]
+        if not jobs:
+            raise SystemExit(f"No jobs with category {category!r}")
+    prepared: list[dict[str, Any]] = []
+    for raw in jobs:
+        merged = apply_defaults(raw, defaults)
+        if max_steps is not None:
+            merged["max_steps"] = max_steps
+        prepared.append(merged)
+    return defaults, prepared
+def job_gpu(job: dict[str, Any]) -> str:
+    return job.get("gpu") or DEFAULT_GPU
+def config_for_profile(profile: str) -> str:
+    """Map an eval_profiles.yaml profile name to its config path (relative to repo root)."""
+    with EVAL_PROFILES_PATH.open() as f:
+        catalog = yaml.safe_load(f) or {}
+    meta = (catalog.get("profiles") or {}).get(profile)
+    if not meta or not meta.get("config"):
+        known = ", ".join(sorted((catalog.get("profiles") or {})))
+        raise SystemExit(
+            f"Unknown eval_profile {profile!r}; check {_EVAL_PROFILES_REL} (known: {known})"
+        )
+    return f"research/evals/configs/{meta['config']}"
+def primary_metric(task_metrics: dict[str, Any]) -> tuple[str, float] | None:
+    """Pick a task's headline (metric_name, score), matching slm_evals summary tables."""
+    for key in _METRIC_PRIORITY:
+        if key in task_metrics and isinstance(task_metrics[key], (int, float)):
+            return key, float(task_metrics[key])
+    for key, value in task_metrics.items():
+        if "stderr" in key:
+            continue
+        if isinstance(value, (int, float)):
+            return key, float(value)
+    return None
+def evaluate_gate(
+    *,
+    candidate: dict[str, Any],
+    baseline: dict[str, Any] | None,
+    goals: dict[str, Any],
+) -> dict[str, Any]:
+    """Check a candidate's lm-eval results dict against `goals` (Hub publish gate).
+    `goals` schema:
+        task: <lm-eval task name>       # scored via primary_metric(), same as summary.md
+        min_score: <float, optional>    # candidate score must be >= this
+        min_improve: <float, optional>  # candidate - baseline must be >= this
+        guard_tasks:                     # optional regression guards
+          - task: <lm-eval task name>
+            max_regress: <float>         # baseline - candidate must be <= this
+    """
+    cand_tasks = candidate.get("results", {})
+    base_tasks = (baseline or {}).get("results", {})
+    def _score(tasks: dict[str, Any], task_name: str) -> float | None:
+        metrics = tasks.get(task_name)
+        if not metrics:
+            return None
+        picked = primary_metric(metrics)
+        return picked[1] if picked else None
+    checks: list[dict[str, Any]] = []
+    passed = True
+    task = goals["task"]
+    cand_score = _score(cand_tasks, task)
+    base_score = _score(base_tasks, task)
+    if goals.get("min_score") is not None:
+        ok = cand_score is not None and cand_score >= goals["min_score"]
+        checks.append({"check": f"{task} >= {goals['min_score']}", "value": cand_score, "ok": ok})
+        passed = passed and ok
+    if goals.get("min_improve") is not None:
+        delta = (
+            cand_score - base_score
+            if (cand_score is not None and base_score is not None)
+            else None
+        )
+        ok = delta is not None and delta >= goals["min_improve"]
+        checks.append(
+            {"check": f"{task} improve >= {goals['min_improve']}", "value": delta, "ok": ok}
+        )
+        passed = passed and ok
+    for guard in goals.get("guard_tasks", []):
+        g_task = guard["task"]
+        g_cand = _score(cand_tasks, g_task)
+        g_base = _score(base_tasks, g_task)
+        regress = g_base - g_cand if (g_cand is not None and g_base is not None) else None
+        ok = regress is not None and regress <= guard["max_regress"]
+        checks.append(
+            {"check": f"{g_task} regress <= {guard['max_regress']}", "value": regress, "ok": ok}
+        )
+        passed = passed and ok
+    if not checks:
+        passed = False
+        checks.append({"check": "goals defined no checks", "value": None, "ok": False})
+    return {
+        "passed": passed,
+        "checks": checks,
+        "task": task,
+        "candidate_score": cand_score,
+        "baseline_score": base_score,
+    }
+def pull_artifacts(job_name: str, exp_name: str, dest: str = "models/finetuned") -> None:
+    """Download an adapter and its lm-eval results from the `slm-finetune` Volume (run locally)."""
+    import subprocess
+    local_dir = f"{dest}/{job_name}"
+    print(f"--- pulling {job_name} -> {local_dir} ---")
+    subprocess.run(
+        ["modal", "volume", "get", "slm-finetune", job_name, local_dir, "--force"],
+        check=False,
+    )
+    results_dir = f"results/lm_eval/{exp_name}"
+    print(f"--- pulling {results_dir} ---")
+    subprocess.run(
+        ["modal", "volume", "get", "slm-finetune", results_dir, results_dir, "--force"],
+        check=False,
+    )
+def check_gate_files(
+    *,
+    candidate_results_path: str,
+    baseline_results_path: str | None,
+    goals: dict[str, Any],
+) -> dict[str, Any]:
+    """Like evaluate_gate(), but reads results.json files (run inside a volume-mounted function)."""
+    cand_path = Path(candidate_results_path)
+    if not cand_path.is_file():
+        return {"passed": False, "checks": [], "reason": f"missing results file: {cand_path}"}
+    candidate = json.loads(cand_path.read_text())
+    baseline = None
+    if baseline_results_path and Path(baseline_results_path).is_file():
+        baseline = json.loads(Path(baseline_results_path).read_text())
+    return evaluate_gate(candidate=candidate, baseline=baseline, goals=goals)
+def render_model_card(
+    *,
+    job: dict[str, Any],
+    gate_result: dict[str, Any],
+    candidate: dict[str, Any],
+    baseline: dict[str, Any] | None,
+    training_payload: dict[str, Any] | None,
+) -> str:
+    def _fmt(v: float | None) -> str:
+        return "—" if v is None else f"{v:.4f}"
+    cand_tasks = candidate.get("results", {})
+    base_tasks = (baseline or {}).get("results", {})
+    base_model = (training_payload or {}).get("model") or BASE_MODEL_ID
+    lines = [
+        "---",
+        "library_name: peft",
+        f"base_model: {base_model}",
+        "license: apache-2.0",
+        "tags:",
+        "  - lora",
+        "  - qlora",
+        "  - build-small-hackathon",
+        "  - well-tuned",
+        f"  - {job.get('category', 'general')}",
+        "---",
+        "",
+        f"# {job['name']}",
+        "",
+        f"QLoRA adapter for **{job.get('category', 'general')}**, fine-tuned from "
+        f"`{base_model}` on `{job['dataset']}` (format: `{job['format']}`).",
+        "",
+        "Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via "
+        "`research/modal/` (app `slm-finetune-benchmark`).",
+        "",
+        "## Benchmark gate",
+        "",
+        f"- eval profile: `{job.get('eval_profile')}`",
+        f"- gate: {'**PASSED**' if gate_result.get('passed') else '**FAILED**'}",
+        "",
+        "| check | value | result |",
+        "| --- | ---: | --- |",
+    ]
+    for c in gate_result.get("checks", []):
+        lines.append(f"| {c['check']} | {_fmt(c['value'])} | {'pass' if c['ok'] else 'fail'} |")
+    if not gate_result.get("checks"):
+        lines.append("| — | — | — |")
+    lines.extend(
+        [
+            "",
+            "## lm-eval results",
+            "",
+            "| task | metric | baseline | candidate | delta |",
+            "| --- | --- | ---: | ---: | ---: |",
+        ]
+    )
+    for task in sorted(set(cand_tasks) | set(base_tasks)):
+        c = primary_metric(cand_tasks.get(task, {}))
+        b = primary_metric(base_tasks.get(task, {}))
+        metric_name = (c or b or (None, None))[0] or "—"
+        c_val = c[1] if c else None
+        b_val = b[1] if b else None
+        delta = c_val - b_val if (c_val is not None and b_val is not None) else None
+        sign = "+" if (delta is not None and delta >= 0) else ""
+        delta_str = "—" if delta is None else f"{sign}{delta:.4f}"
+        lines.append(f"| {task} | {metric_name} | {_fmt(b_val)} | {_fmt(c_val)} | {delta_str} |")
+    if training_payload:
+        lines.extend(
+            [
+                "",
+                "## Training",
+                "",
+                f"- dataset: `{training_payload.get('dataset')}`",
+                f"- mode: `{training_payload.get('mode')}`",
+                f"- samples: {training_payload.get('samples')}",
+                f"- final train loss: {training_payload.get('metrics', {}).get('final_train_loss')}",
+                f"- eval loss: {training_payload.get('metrics', {}).get('eval_loss')}",
+            ]
+        )
+    lines.extend(
+        [
+            "",
+            "## Load with PEFT",
+            "",
+            "```python",
+            "from peft import PeftModel",
+            "from transformers import AutoModelForCausalLM, AutoTokenizer",
+            "",
+            f'base = "{base_model}"',
+            f'adapter = "{job.get("publish", {}).get("hub_repo", "<hub-repo>")}"',
+            "",
+            "tokenizer = AutoTokenizer.from_pretrained(base, trust_remote_code=True)",
+            "model = AutoModelForCausalLM.from_pretrained(",
+            '    base, torch_dtype="auto", device_map="auto", trust_remote_code=True',
+            ")",
+            "model = PeftModel.from_pretrained(model, adapter)",
+            "```",
+            "",
+        ]
+    )
+    return "\n".join(lines) + "\n"
+def publish_adapter_files(
+    *,
+    job: dict[str, Any],
+    adapter_dir: str,
+    gate_result: dict[str, Any],
+    candidate_results_path: str,
+    baseline_results_path: str | None,
+) -> dict[str, Any]:
+    """Write a model card and push the adapter to the Hub — only if the gate passed.
+    Run inside a function with `finetune_vol` mounted and `hf_secret` set.
+    """
+    publish_cfg = job.get("publish")
+    if not publish_cfg:
+        return {"published": False, "reason": "no publish config for this job"}
+    if not gate_result.get("passed"):
+        return {"published": False, "reason": "gate failed", "gate": gate_result}
+    adapter_path = Path(adapter_dir)
+    if not adapter_path.is_dir():
+        return {"published": False, "reason": f"adapter dir missing: {adapter_dir}"}
+    candidate = {}
+    cand_path = Path(candidate_results_path)
+    if cand_path.is_file():
+        candidate = json.loads(cand_path.read_text())
+    baseline = None
+    if baseline_results_path and Path(baseline_results_path).is_file():
+        baseline = json.loads(Path(baseline_results_path).read_text())
+    training_payload = None
+    training_results_path = adapter_path / "training_results.json"
+    if training_results_path.is_file():
+        training_payload = json.loads(training_results_path.read_text())
+    card = render_model_card(
+        job=job,
+        gate_result=gate_result,
+        candidate=candidate,
+        baseline=baseline,
+        training_payload=training_payload,
+    )
+    (adapter_path / "README.md").write_text(card)
+    commit_volumes()
+    from huggingface_hub import HfApi
+    repo_id = publish_cfg["hub_repo"]
+    private = publish_cfg.get("private", True)
+    api = HfApi()
+    api.create_repo(repo_id=repo_id, repo_type="model", private=private, exist_ok=True)
+    api.upload_folder(
+        folder_path=str(adapter_path),
+        repo_id=repo_id,
+        repo_type="model",
+        commit_message=f"Publish {job['name']} (gate passed: {gate_result.get('task')})",
+    )
+    return {"published": True, "repo_id": repo_id, "url": f"https://huggingface.co/{repo_id}"}

research/modal/experiments.yaml ADDED Viewed

	@@ -0,0 +1,131 @@

+# Skill matrix for the Modal finetune + lm-eval + publish pipeline.
+#
+# Each entry trains one QLoRA adapter for a skill/category, evaluates it
+# against the matching slm-lm-eval profile (vs. a per-profile baseline),
+# checks the result against `goals`, and — only if the gate passes —
+# publishes the adapter to `publish.hub_repo` on the Hugging Face Hub.
+#
+# Smoke limits (max_steps, max_samples, eval `limit` in the profile configs)
+# keep hackathon runs affordable; bump them for full runs.
+#
+# publish.private is `false` so passing adapters land on the Hub publicly: the
+# Well-Tuned badge requires a judge-visible, fine-tuned published model.
+#
+# Workflows (see modal/README.md):
+#   modal run research/modal/finetune_app.py                    # full sweep: baselines -> train -> eval -> gate -> publish -> pull
+#   modal run research/modal/finetune_app.py --job math-lora     # one skill
+#   modal run research/modal/finetune_app.py --category math     # one category
+#   modal run research/modal/finetune_app.py --eval-only --job math-lora
+#   modal run research/modal/finetune_app.py --no-publish        # train+eval, skip Hub push
+#   modal run research/modal/finetune_app.py::publish_only --job math-lora
+#   modal run research/modal/finetune_app.py::pull --category math
+defaults:
+  preset: minicpm5-1b
+  mode: qlora
+  gpu: A10G          # QLoRA fits on T4 too; override per job with `gpu: T4` for cheaper runs
+  max_steps: 100
+  # Hugging Face namespace for published adapters.
+  hub_org: MSGEncrypted
+finetune:
+  # --- teaching: lesson-planning agent chat data (Well-Tuned primary) ---
+  - name: teaching-lora
+    category: teaching
+    dataset: research/data/education-lesson-chat.jsonl
+    format: chat
+    description: Lesson-planning agent chat data (local)
+    eval_profile: instructions
+    goals:
+      task: ifeval
+      min_score: 0.15
+      min_improve: 0.02
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-teaching-lora
+      private: false
+  # --- science: factual + explanatory science tutoring ---
+  - name: science-lora
+    category: science
+    dataset: research/data/science-tutor-chat.jsonl
+    format: chat
+    description: Science tutor Q&A chat data (local)
+    eval_profile: science
+    goals:
+      task: sciq
+      min_score: 0.50
+      min_improve: 0.02
+      guard_tasks:
+        - task: arc_challenge
+          max_regress: 0.03
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-science-lora
+      private: false
+  # --- math: grade-school word problems + instruction-style math solutions ---
+  - name: math-lora
+    category: math
+    dataset: TIGER-Lab/MathInstruct
+    format: alpaca
+    dataset_split: "train[:1000]"
+    max_samples: 1000
+    description: Math instruction tuning (Hub, instruction/output columns)
+    eval_profile: math
+    goals:
+      task: gsm8k
+      min_score: 0.05
+      min_improve: 0.02
+      guard_tasks:
+        - task: arc_challenge
+          max_regress: 0.03
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-math-lora
+      private: false
+  # --- coding: Python instruction-following code generation ---
+  - name: coding-lora
+    category: coding
+    dataset: iamtarun/python_code_instructions_18k_alpaca
+    format: alpaca
+    dataset_split: "train[:1000]"
+    max_samples: 1000
+    description: Python code instruction tuning (Hub, alpaca columns)
+    eval_profile: code
+    goals:
+      task: mbpp
+      min_score: 0.05
+      min_improve: 0.01
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-coding-lora
+      private: false
+  # --- reasoning: multi-turn chat with reasoning-heavy conversations ---
+  - name: reasoning-lora
+    category: reasoning
+    dataset: HuggingFaceTB/smoltalk
+    format: chat
+    dataset_config: all
+    dataset_split: "train[:500]"
+    max_samples: 500
+    description: Multi-turn reasoning/chat subset (Hub)
+    eval_profile: reasoning
+    goals:
+      task: gsm8k
+      min_score: 0.05
+      min_improve: 0.01
+      guard_tasks:
+        - task: hellaswag
+          max_regress: 0.03
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-reasoning-lora
+      private: false
+  # --- general instructions baseline: no goals/publish -> local-only adapter ---
+  - name: alpaca-lora
+    category: instructions
+    dataset: tatsu-lab/alpaca
+    format: alpaca
+    dataset_split: train
+    max_samples: 200
+    description: General instruction tuning baseline (Hub, local-only)
+    eval_profile: instructions

research/modal/finetune_app.py ADDED Viewed

	@@ -0,0 +1,370 @@

+"""
+Modal GPU pipeline for research/finetune.py + slm-lm-eval.
+Skill-matrix pipeline: train -> eval -> gate -> publish.
+Each job in experiments.yaml fine-tunes one QLoRA adapter for a skill
+(math, science, coding, reasoning, teaching, ...), evaluates it against the
+matching slm-lm-eval profile vs. a per-profile baseline, checks the result
+against `goals`, and (only if the gate passes) publishes the adapter to the
+Hugging Face Hub.
+Run from repo root:
+    modal run research/modal/finetune_app.py
+    modal run research/modal/finetune_app.py --eval-only
+    modal run research/modal/finetune_app.py --job math-lora --max-steps 20
+    modal run research/modal/finetune_app.py --category science
+    modal run research/modal/finetune_app.py --no-publish --no-pull
+    modal run research/modal/finetune_app.py::publish_only --job math-lora
+    modal run research/modal/finetune_app.py::pull --category math
+"""
+from __future__ import annotations
+import json
+import subprocess
+import sys
+from pathlib import Path
+from typing import Any
+import modal
+# Make `_common` importable both locally (sibling file) and in the Modal
+# container, where the entrypoint lands at /root but the repo is baked into the
+# image at /repo (see add_local_dir in _common.py).
+for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")):
+    if _candidate.is_dir() and str(_candidate) not in sys.path:
+        sys.path.insert(0, str(_candidate))
+from _common import (
+    BASE_MODEL_ID,
+    FINETUNE_VOL_PATH,
+    HF_CACHE_PATH,
+    LM_EVAL_OUTPUT,
+    build_finetune_cmd,
+    build_lm_eval_cmd,
+    check_gate_files,
+    commit_volumes,
+    config_for_profile,
+    finetune_vol,
+    hf_cache_vol,
+    hf_secret,
+    image,
+    job_gpu,
+    load_experiments,
+    prepare_jobs,
+    publish_adapter_files,
+    pull_artifacts,
+    reload_volumes,
+    repo_env,
+)
+APP_NAME = "slm-finetune-benchmark"
+app = modal.App(APP_NAME, image=image)
+@app.function(
+    gpu="A10G",
+    volumes={
+        HF_CACHE_PATH: hf_cache_vol,
+        FINETUNE_VOL_PATH: finetune_vol,
+    },
+    secrets=[hf_secret],
+    timeout=7200,
+)
+def finetune_one(job: dict[str, Any]) -> dict[str, Any]:
+    """Fine-tune one dataset job; persist adapter to Modal Volume."""
+    name = job["name"]
+    out_dir = f"{FINETUNE_VOL_PATH}/{name}"
+    Path(out_dir).mkdir(parents=True, exist_ok=True)
+    cmd = build_finetune_cmd(job, out_dir)
+    print("Running:", " ".join(cmd))
+    subprocess.run(cmd, cwd="/repo", check=True, env=repo_env())
+    commit_volumes()
+    results_path = Path(out_dir) / "training_results.json"
+    payload = json.loads(results_path.read_text())
+    payload["job_name"] = name
+    return payload
+@app.function(
+    gpu="A10G",
+    volumes={
+        HF_CACHE_PATH: hf_cache_vol,
+        FINETUNE_VOL_PATH: finetune_vol,
+    },
+    secrets=[hf_secret],
+    timeout=3600,
+)
+def run_lm_eval(
+    *,
+    experiment_name: str,
+    config: str = "research/evals/configs/lm_eval_smoke.yaml",
+    preset: str | None = None,
+    model_path: str | None = None,
+    adapter_path: str | None = None,
+    compare_to: str | None = None,
+) -> dict[str, Any]:
+    """Run slm-lm-eval on base model or finetuned checkpoint."""
+    reload_volumes()
+    if adapter_path:
+        adapter_dir = Path(adapter_path)
+        adapter_cfg = adapter_dir / "adapter_config.json"
+        if not adapter_cfg.is_file():
+            raise FileNotFoundError(
+                f"LoRA adapter not visible at {adapter_path} "
+                f"(missing {adapter_cfg.name}). "
+                "If training just finished, retry after volume commit/reload."
+            )
+    cmd = build_lm_eval_cmd(
+        experiment_name=experiment_name,
+        config=config,
+        preset=preset,
+        model_path=model_path,
+        adapter_path=adapter_path,
+        compare_to=compare_to,
+    )
+    print("Running:", " ".join(cmd))
+    proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
+    commit_volumes()
+    out_root = Path(LM_EVAL_OUTPUT) / experiment_name
+    results_json = out_root / "results.json"
+    summary_md = out_root / "summary.md"
+    comparison_md = out_root / "comparison.md"
+    return {
+        "experiment_name": experiment_name,
+        "config": config,
+        "preset": preset,
+        "model_path": model_path,
+        "adapter_path": adapter_path,
+        "compare_to": compare_to,
+        "results_json": str(results_json),
+        "summary_md": str(summary_md),
+        "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
+        "exit_code": proc.returncode,
+        "ok": proc.returncode == 0 and results_json.is_file(),
+    }
+@app.function(volumes={FINETUNE_VOL_PATH: finetune_vol}, timeout=300)
+def check_gate(
+    *,
+    candidate_results_path: str,
+    baseline_results_path: str | None,
+    goals: dict[str, Any],
+) -> dict[str, Any]:
+    """Check a candidate's lm-eval results against `goals` (Hub publish gate)."""
+    reload_volumes()
+    return check_gate_files(
+        candidate_results_path=candidate_results_path,
+        baseline_results_path=baseline_results_path,
+        goals=goals,
+    )
+@app.function(
+    volumes={FINETUNE_VOL_PATH: finetune_vol},
+    secrets=[hf_secret],
+    timeout=900,
+)
+def publish_adapter(
+    *,
+    job: dict[str, Any],
+    adapter_dir: str,
+    gate_result: dict[str, Any],
+    candidate_results_path: str,
+    baseline_results_path: str | None,
+) -> dict[str, Any]:
+    """Write a model card and push the adapter to the Hub, but only if the gate passed."""
+    reload_volumes()
+    return publish_adapter_files(
+        job=job,
+        adapter_dir=adapter_dir,
+        gate_result=gate_result,
+        candidate_results_path=candidate_results_path,
+        baseline_results_path=baseline_results_path,
+    )
+def _print_summary(rows: list[dict[str, Any]]) -> None:
+    print("\n--- summary ---")
+    print(f"{'skill':<18} {'category':<12} {'gate':<6} {'published':<10} hub_repo")
+    for row in rows:
+        gate = "PASS" if row.get("gate_passed") else "fail"
+        published = "yes" if row.get("published") else "no"
+        print(
+            f"{row['name']:<18} {row.get('category') or '-':<12} {gate:<6} "
+            f"{published:<10} {row.get('hub_repo') or '-'}"
+        )
+@app.local_entrypoint()
+def main(
+    train: bool = True,
+    eval_only: bool = False,
+    parallel: bool = False,
+    job: str | None = None,
+    category: str | None = None,
+    max_steps: int | None = None,
+    publish: bool = True,
+    pull: bool = True,
+):
+    """
+    Skill-matrix pipeline: per-profile baselines -> train -> eval -> gate -> publish -> pull.
+    Examples:
+        modal run research/modal/finetune_app.py
+        modal run research/modal/finetune_app.py --job math-lora --max-steps 20
+        modal run research/modal/finetune_app.py --category science
+        modal run research/modal/finetune_app.py --eval-only --job math-lora
+        modal run research/modal/finetune_app.py --no-publish --no-pull
+    """
+    defaults, prepared = prepare_jobs(job=job, category=category, max_steps=max_steps)
+    if not prepared:
+        raise SystemExit("No matching jobs; check --job/--category and experiments.yaml")
+    preset = defaults.get("preset", "minicpm5-1b")
+    profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
+    baselines_ok: dict[str, bool] = {}
+    if not eval_only:
+        print(f"--- baselines ({', '.join(profiles)}) ---")
+        for profile in profiles:
+            result = run_lm_eval.remote(
+                experiment_name=f"{preset}__baseline__{profile}",
+                config=config_for_profile(profile),
+                preset=preset,
+            )
+            print(json.dumps(result, indent=2))
+            baselines_ok[profile] = bool(result.get("ok"))
+    train_results: dict[str, dict[str, Any]] = {}
+    if train and not eval_only:
+        print(f"--- finetune ({len(prepared)} job(s), parallel={parallel}) ---")
+        if parallel:
+            handles = {
+                j["name"]: finetune_one.with_options(gpu=job_gpu(j)).spawn(j)
+                for j in prepared
+            }
+            for name, handle in handles.items():
+                train_results[name] = handle.get()
+                print(json.dumps(train_results[name], indent=2))
+        else:
+            for j in prepared:
+                print(f"Training {j['name']}...")
+                result = finetune_one.with_options(gpu=job_gpu(j)).remote(j)
+                train_results[j["name"]] = result
+                print(json.dumps(result, indent=2))
+    print("--- post-train lm-eval / gate / publish ---")
+    summary: list[dict[str, Any]] = []
+    for j in prepared:
+        job_name = j["name"]
+        profile = j.get("eval_profile", "compare_study")
+        train_payload = train_results.get(job_name)
+        adapter_path = (
+            train_payload["output_dir"] if train_payload else f"{FINETUNE_VOL_PATH}/{job_name}"
+        )
+        baseline_path = f"{LM_EVAL_OUTPUT}/{preset}__baseline__{profile}/results.json"
+        compare_to = baseline_path if baselines_ok.get(profile) else None
+        exp_name = f"{job_name}__{profile}"
+        eval_result = run_lm_eval.remote(
+            experiment_name=exp_name,
+            config=config_for_profile(profile),
+            model_path=BASE_MODEL_ID,
+            adapter_path=adapter_path,
+            compare_to=compare_to,
+        )
+        print(json.dumps(eval_result, indent=2))
+        row: dict[str, Any] = {
+            "name": job_name,
+            "category": j.get("category"),
+            "profile": profile,
+        }
+        gate_result: dict[str, Any] | None = None
+        if j.get("goals"):
+            if eval_result.get("ok"):
+                gate_result = check_gate.remote(
+                    candidate_results_path=eval_result["results_json"],
+                    baseline_results_path=baseline_path,
+                    goals=j["goals"],
+                )
+                print(json.dumps(gate_result, indent=2))
+            row["gate_passed"] = bool(gate_result and gate_result.get("passed"))
+        if j.get("publish"):
+            row["hub_repo"] = j["publish"].get("hub_repo")
+            if publish and gate_result is not None:
+                publish_result = publish_adapter.remote(
+                    job=j,
+                    adapter_dir=adapter_path,
+                    gate_result=gate_result,
+                    candidate_results_path=eval_result["results_json"],
+                    baseline_results_path=baseline_path,
+                )
+                print(json.dumps(publish_result, indent=2))
+                row["published"] = publish_result.get("published")
+        summary.append(row)
+        if pull:
+            pull_artifacts(job_name, exp_name)
+    _print_summary(summary)
+@app.local_entrypoint()
+def publish_only(job: str):
+    """Re-run the gate and Hub publish for a job using already-computed results (no train/eval)."""
+    defaults, prepared = prepare_jobs(job=job)
+    j = prepared[0]
+    if not j.get("goals"):
+        raise SystemExit(f"Job {job!r} has no `goals`; nothing to gate on")
+    if not j.get("publish"):
+        raise SystemExit(f"Job {job!r} has no `publish` config")
+    preset = defaults.get("preset", "minicpm5-1b")
+    profile = j.get("eval_profile", "compare_study")
+    adapter_path = f"{FINETUNE_VOL_PATH}/{job}"
+    candidate_results_path = f"{LM_EVAL_OUTPUT}/{job}__{profile}/results.json"
+    baseline_results_path = f"{LM_EVAL_OUTPUT}/{preset}__baseline__{profile}/results.json"
+    gate_result = check_gate.remote(
+        candidate_results_path=candidate_results_path,
+        baseline_results_path=baseline_results_path,
+        goals=j["goals"],
+    )
+    print(json.dumps(gate_result, indent=2))
+    publish_result = publish_adapter.remote(
+        job=j,
+        adapter_dir=adapter_path,
+        gate_result=gate_result,
+        candidate_results_path=candidate_results_path,
+        baseline_results_path=baseline_results_path,
+    )
+    print(json.dumps(publish_result, indent=2))
+@app.local_entrypoint()
+def pull(job: str | None = None, category: str | None = None, dest: str = "models/finetuned"):
+    """Download adapters and their lm-eval results from the `slm-finetune` Volume."""
+    _, prepared = prepare_jobs(job=job, category=category)
+    if not prepared:
+        raise SystemExit("No matching jobs; pass --job or --category")
+    for j in prepared:
+        profile = j.get("eval_profile", "compare_study")
+        pull_artifacts(j["name"], f"{j['name']}__{profile}", dest)

research/modal/server_app.py ADDED Viewed

	@@ -0,0 +1,472 @@

+"""
+Long-lived Modal GPU worker — reuse one warm container for many finetune / eval runs.
+Deploy once (enables min_containers warm pool across separate CLI invocations):
+    modal deploy research/modal/server_app.py
+Default: keep a GPU worker alive for several hours (blocks local terminal):
+    modal run research/modal/server_app.py
+    modal run research/modal/server_app.py --hours 6
+Detached keep-alive (local terminal free):
+    modal run -d research/modal/server_app.py --hours 6
+Run the skill-matrix pipeline on the warm worker (separate terminal, same
+container when deployed) — per-profile baselines -> finetune -> eval -> gate -> publish:
+    modal run research/modal/server_app.py --job math-lora --max-steps 20
+    modal run research/modal/server_app.py --category science
+    modal run research/modal/server_app.py --pipeline --no-publish
+    modal run research/modal/server_app.py --eval-only --job math-lora
+    modal run research/modal/server_app.py --publish-only --job math-lora
+    modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
+Stop deployed app:
+    modal app stop slm-gpu-worker
+"""
+from __future__ import annotations
+import json
+import shlex
+import subprocess
+import sys
+import time
+from pathlib import Path
+from typing import Any
+import modal
+# Make `_common` importable both locally (sibling file) and in the Modal
+# container, where the entrypoint lands at /root but the repo is baked into the
+# image at /repo (see add_local_dir in _common.py).
+for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")):
+    if _candidate.is_dir() and str(_candidate) not in sys.path:
+        sys.path.insert(0, str(_candidate))
+from _common import (
+    BASE_MODEL_ID,
+    DEFAULT_GPU,
+    DEFAULT_KEEPALIVE_HOURS,
+    DEFAULT_SCALEDOWN_WINDOW,
+    DEFAULT_WORKER_TIMEOUT,
+    FINETUNE_VOL_PATH,
+    HF_CACHE_PATH,
+    LM_EVAL_OUTPUT,
+    apply_defaults,
+    build_finetune_cmd,
+    build_lm_eval_cmd,
+    check_gate_files,
+    commit_volumes,
+    config_for_profile,
+    finetune_vol,
+    hf_cache_vol,
+    hf_secret,
+    image,
+    load_experiments,
+    prepare_jobs,
+    publish_adapter_files,
+    pull_artifacts,
+    reload_volumes,
+    repo_env,
+)
+APP_NAME = "slm-gpu-worker"
+app = modal.App(APP_NAME, image=image)
+@app.cls(
+    gpu=DEFAULT_GPU,
+    volumes={
+        HF_CACHE_PATH: hf_cache_vol,
+        FINETUNE_VOL_PATH: finetune_vol,
+    },
+    secrets=[hf_secret],
+    timeout=DEFAULT_WORKER_TIMEOUT,
+    scaledown_window=DEFAULT_SCALEDOWN_WINDOW,
+    min_containers=1,
+)
+class GpuWorker:
+    """Single warm GPU container for sequential finetune / lm-eval / shell commands."""
+    @modal.enter()
+    def startup(self) -> None:
+        reload_volumes()
+        print(
+            f"GpuWorker ready (HF cache={HF_CACHE_PATH}, finetune vol={FINETUNE_VOL_PATH})"
+        )
+    @modal.method()
+    def ping(self) -> dict[str, str]:
+        return {"status": "ok", "app": APP_NAME}
+    @modal.method()
+    def keep_alive(self, hours: float = DEFAULT_KEEPALIVE_HOURS) -> dict[str, Any]:
+        """Hold this container open; cheap heartbeat so scaledown_window stays fresh."""
+        deadline = time.time() + hours * 3600
+        ticks = 0
+        while time.time() < deadline:
+            remaining = int(deadline - time.time())
+            if ticks % 5 == 0:
+                print(f"keep_alive: {remaining}s remaining")
+            time.sleep(60)
+            ticks += 1
+        return {"status": "done", "hours": hours}
+    @modal.method()
+    def exec_cmd(self, argv: list[str], cwd: str = "/repo") -> dict[str, Any]:
+        """Run an arbitrary command in the repo (same env as finetune.py)."""
+        print("Running:", " ".join(argv))
+        proc = subprocess.run(
+            argv,
+            cwd=cwd,
+            check=False,
+            env=repo_env(),
+            capture_output=True,
+            text=True,
+        )
+        commit_volumes()
+        return {
+            "argv": argv,
+            "exit_code": proc.returncode,
+            "ok": proc.returncode == 0,
+            "stdout": proc.stdout,
+            "stderr": proc.stderr,
+        }
+    @modal.method()
+    def finetune(self, job: dict[str, Any]) -> dict[str, Any]:
+        """Fine-tune one dataset job via research/finetune.py."""
+        name = job["name"]
+        out_dir = f"{FINETUNE_VOL_PATH}/{name}"
+        Path(out_dir).mkdir(parents=True, exist_ok=True)
+        cmd = build_finetune_cmd(job, out_dir)
+        print("Running:", " ".join(cmd))
+        subprocess.run(cmd, cwd="/repo", check=True, env=repo_env())
+        commit_volumes()
+        results_path = Path(out_dir) / "training_results.json"
+        payload = json.loads(results_path.read_text())
+        payload["job_name"] = name
+        payload["output_dir"] = out_dir
+        return payload
+    @modal.method()
+    def lm_eval(
+        self,
+        *,
+        experiment_name: str,
+        config: str = "research/evals/configs/lm_eval_smoke.yaml",
+        preset: str | None = None,
+        model_path: str | None = None,
+        adapter_path: str | None = None,
+        compare_to: str | None = None,
+    ) -> dict[str, Any]:
+        """Run slm-lm-eval on base model or finetuned checkpoint."""
+        if adapter_path:
+            adapter_dir = Path(adapter_path)
+            adapter_cfg = adapter_dir / "adapter_config.json"
+            if not adapter_cfg.is_file():
+                raise FileNotFoundError(
+                    f"LoRA adapter not visible at {adapter_path} "
+                    f"(missing {adapter_cfg.name})."
+                )
+        cmd = build_lm_eval_cmd(
+            experiment_name=experiment_name,
+            config=config,
+            preset=preset,
+            model_path=model_path,
+            adapter_path=adapter_path,
+            compare_to=compare_to,
+        )
+        print("Running:", " ".join(cmd))
+        proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
+        commit_volumes()
+        out_root = Path(LM_EVAL_OUTPUT) / experiment_name
+        results_json = out_root / "results.json"
+        summary_md = out_root / "summary.md"
+        comparison_md = out_root / "comparison.md"
+        return {
+            "experiment_name": experiment_name,
+            "config": config,
+            "preset": preset,
+            "model_path": model_path,
+            "adapter_path": adapter_path,
+            "compare_to": compare_to,
+            "results_json": str(results_json),
+            "summary_md": str(summary_md),
+            "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
+            "exit_code": proc.returncode,
+            "ok": proc.returncode == 0,
+        }
+    @modal.method()
+    def check_gate(
+        self,
+        *,
+        candidate_results_path: str,
+        baseline_results_path: str | None,
+        goals: dict[str, Any],
+    ) -> dict[str, Any]:
+        """Check a candidate's lm-eval results against `goals` (Hub publish gate)."""
+        return check_gate_files(
+            candidate_results_path=candidate_results_path,
+            baseline_results_path=baseline_results_path,
+            goals=goals,
+        )
+    @modal.method()
+    def publish_adapter(
+        self,
+        *,
+        job: dict[str, Any],
+        adapter_dir: str,
+        gate_result: dict[str, Any],
+        candidate_results_path: str,
+        baseline_results_path: str | None,
+    ) -> dict[str, Any]:
+        """Write a model card and push the adapter to the Hub, but only if the gate passed."""
+        return publish_adapter_files(
+            job=job,
+            adapter_dir=adapter_dir,
+            gate_result=gate_result,
+            candidate_results_path=candidate_results_path,
+            baseline_results_path=baseline_results_path,
+        )
+    @modal.method()
+    def run_pipeline(
+        self,
+        *,
+        job_names: list[str] | None = None,
+        category: str | None = None,
+        max_steps: int | None = None,
+        train: bool = True,
+        eval_only: bool = False,
+        publish: bool = True,
+    ) -> dict[str, Any]:
+        """Per-profile baselines -> finetune -> eval -> gate -> publish (same container)."""
+        spec = load_experiments()
+        defaults = spec.get("defaults", {})
+        jobs = spec.get("finetune", [])
+        if job_names:
+            jobs = [j for j in jobs if j.get("name") in job_names]
+            if not jobs:
+                raise ValueError(f"No matching jobs in experiments.yaml: {job_names}")
+        if category:
+            jobs = [j for j in jobs if j.get("category") == category]
+            if not jobs:
+                raise ValueError(f"No jobs with category {category!r}")
+        if not jobs:
+            raise ValueError("No jobs matched job_names/category")
+        preset = defaults.get("preset", "minicpm5-1b")
+        prepared: list[dict[str, Any]] = []
+        for raw in jobs:
+            merged = apply_defaults(raw, defaults)
+            if max_steps is not None:
+                merged["max_steps"] = max_steps
+            prepared.append(merged)
+        profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
+        baselines_ok: dict[str, bool] = {}
+        if not eval_only:
+            for profile in profiles:
+                result = self.lm_eval.local(
+                    experiment_name=f"{preset}__baseline__{profile}",
+                    config=config_for_profile(profile),
+                    preset=preset,
+                )
+                baselines_ok[profile] = bool(result.get("ok"))
+        train_results: dict[str, dict[str, Any]] = {}
+        if train and not eval_only:
+            for j in prepared:
+                train_results[j["name"]] = self.finetune.local(j)
+        rows: list[dict[str, Any]] = []
+        for j in prepared:
+            job_name = j["name"]
+            profile = j.get("eval_profile", "compare_study")
+            train_payload = train_results.get(job_name)
+            adapter_path = (
+                train_payload["output_dir"]
+                if train_payload
+                else f"{FINETUNE_VOL_PATH}/{job_name}"
+            )
+            baseline_path = f"{LM_EVAL_OUTPUT}/{preset}__baseline__{profile}/results.json"
+            compare_to = baseline_path if baselines_ok.get(profile) else None
+            exp_name = f"{job_name}__{profile}"
+            eval_result = self.lm_eval.local(
+                experiment_name=exp_name,
+                config=config_for_profile(profile),
+                model_path=BASE_MODEL_ID,
+                adapter_path=adapter_path,
+                compare_to=compare_to,
+            )
+            row: dict[str, Any] = {
+                "name": job_name,
+                "category": j.get("category"),
+                "profile": profile,
+                "eval": eval_result,
+            }
+            gate_result: dict[str, Any] | None = None
+            if j.get("goals"):
+                if eval_result.get("ok"):
+                    gate_result = self.check_gate.local(
+                        candidate_results_path=eval_result["results_json"],
+                        baseline_results_path=baseline_path,
+                        goals=j["goals"],
+                    )
+                row["gate"] = gate_result
+            if j.get("publish") and publish and gate_result is not None:
+                row["publish"] = self.publish_adapter.local(
+                    job=j,
+                    adapter_dir=adapter_path,
+                    gate_result=gate_result,
+                    candidate_results_path=eval_result["results_json"],
+                    baseline_results_path=baseline_path,
+                )
+            rows.append(row)
+        return {"jobs": rows}
+def _worker() -> GpuWorker:
+    """Prefer deployed warm worker; fall back to ephemeral cls for first deploy."""
+    try:
+        cls = modal.Cls.from_name(APP_NAME, "GpuWorker")
+        return cls()
+    except modal.exception.NotFoundError:
+        return GpuWorker()
+@app.local_entrypoint()
+def main(
+    serve: bool = True,
+    hours: float = DEFAULT_KEEPALIVE_HOURS,
+    cmd: str | None = None,
+    job: str | None = None,
+    category: str | None = None,
+    max_steps: int | None = None,
+    eval_only: bool = False,
+    pipeline: bool = False,
+    publish: bool = True,
+    publish_only: bool = False,
+    pull: bool = True,
+    ping: bool = False,
+):
+    """
+    GPU worker CLI.
+    With no task flags, keeps one container alive (default). With --job/--category,
+    --cmd, --eval-only, --pipeline, or --publish-only, runs that task on the warm
+    worker instead. --pipeline (and --job/--category/--eval-only) run the skill-matrix
+    pipeline: per-profile baselines -> finetune -> eval -> gate -> publish.
+    Examples:
+        modal deploy research/modal/server_app.py
+        modal run research/modal/server_app.py
+        modal run research/modal/server_app.py --pipeline --job math-lora --max-steps 20
+        modal run research/modal/server_app.py --pipeline --category science --no-publish
+        modal run research/modal/server_app.py --eval-only --job math-lora
+        modal run research/modal/server_app.py --publish-only --job math-lora
+        modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
+    """
+    has_task = bool(cmd or job or category or eval_only or pipeline or publish_only or ping)
+    if has_task:
+        serve = False
+    worker = _worker()
+    if ping:
+        print(json.dumps(worker.ping.remote(), indent=2))
+        return
+    if cmd:
+        argv = shlex.split(cmd)
+        result = worker.exec_cmd.remote(argv)
+        if result.get("stdout"):
+            print(result["stdout"], end="")
+        if result.get("stderr"):
+            print(result["stderr"], end="", file=__import__("sys").stderr)
+        if not result.get("ok"):
+            raise SystemExit(result.get("exit_code", 1))
+        return
+    if publish_only:
+        if not job:
+            raise SystemExit("--publish-only requires --job")
+        defaults, prepared = prepare_jobs(job=job)
+        j = prepared[0]
+        if not j.get("goals") or not j.get("publish"):
+            raise SystemExit(f"Job {job!r} needs `goals` and `publish` in experiments.yaml")
+        preset = defaults.get("preset", "minicpm5-1b")
+        profile = j.get("eval_profile", "compare_study")
+        adapter_path = f"{FINETUNE_VOL_PATH}/{job}"
+        candidate_results_path = f"{LM_EVAL_OUTPUT}/{job}__{profile}/results.json"
+        baseline_results_path = f"{LM_EVAL_OUTPUT}/{preset}__baseline__{profile}/results.json"
+        gate_result = worker.check_gate.remote(
+            candidate_results_path=candidate_results_path,
+            baseline_results_path=baseline_results_path,
+            goals=j["goals"],
+        )
+        print(json.dumps(gate_result, indent=2))
+        result = worker.publish_adapter.remote(
+            job=j,
+            adapter_dir=adapter_path,
+            gate_result=gate_result,
+            candidate_results_path=candidate_results_path,
+            baseline_results_path=baseline_results_path,
+        )
+        print(json.dumps(result, indent=2))
+        return
+    if pipeline or job or category or eval_only:
+        job_names = [job] if job else None
+        result = worker.run_pipeline.remote(
+            job_names=job_names,
+            category=category,
+            max_steps=max_steps,
+            train=not eval_only,
+            eval_only=eval_only,
+            publish=publish,
+        )
+        print(json.dumps(result, indent=2))
+        if pull:
+            for row in result.get("jobs", []):
+                pull_artifacts(row["name"], f"{row['name']}__{row['profile']}")
+        return
+    if serve:
+        print(
+            f"Keeping GpuWorker alive for {hours}h "
+            f"(deploy with `modal deploy` so other terminals reuse this container)"
+        )
+        worker.ping.remote()
+        result = worker.keep_alive.remote(hours=hours)
+        print(json.dumps(result, indent=2))
+        return
+    raise SystemExit(
+        "Nothing to do. Use default serve mode, or pass --job, --category, --cmd, "
+        "--pipeline, --eval-only, --publish-only, or --ping."
+    )

{notebook → research/notebook}/gemma-finetune.ipynb RENAMED Viewed

File without changes

research/notebook/minicpm5-modal-finetune.ipynb ADDED Viewed

	@@ -0,0 +1,216 @@

+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# MiniCPM5-1B fine-tune on Modal Notebooks\n",
+        "\n",
+        "Interactive path for the **Modal** + **Well-Tuned** hackathon tracks.\n",
+        "\n",
+        "**Setup (sidebar before running cells):**\n",
+        "1. [modal.com/notebooks](https://modal.com/notebooks) — upload this `.ipynb`\n",
+        "2. **Compute profile** → enable GPU (e.g. A10G)\n",
+        "3. **Files** → attach Volume `slm-finetune` (mounts at `/mnt/slm-finetune`)\n",
+        "4. **Secrets** → attach `huggingface` (`HF_TOKEN`)\n",
+        "\n",
+        "Model: [openbmb/MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B)\n",
+        "\n",
+        "Docs: [Modal Notebooks](https://modal.com/docs/guide/notebooks) · [Volumes](https://modal.com/docs/guide/volumes)\n",
+        "\n",
+        "For reproducible sweeps: `modal run research/modal/finetune_app.py`"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "# Verify GPU (Modal Notebooks provide CUDA)\n",
+        "!nvidia-smi"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "# Clone repo (replace with your fork URL) or upload files via the Files panel\n",
+        "!git clone https://github.com/YOUR_USER/small-model-hackathon.git /root/repo 2>/dev/null || true\n",
+        "%cd /root/repo\n",
+        "!pwd && ls research/finetune.py models.yaml"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "29c67e82"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "# Install project deps (Modal default image has torch/transformers; add finetune stack)\n",
+        "%uv pip install uv peft datasets bitsandbytes accelerate\n",
+        "!uv sync --frozen --group finetune --group lm-eval --no-dev"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "42516dfe"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "import os\n",
+        "from pathlib import Path\n",
+        "\n",
+        "os.environ[\"TRUST_REMOTE_CODE\"] = \"true\"\n",
+        "os.environ.setdefault(\"PYTORCH_CUDA_ALLOC_CONF\", \"expandable_segments:True\")\n",
+        "\n",
+        "# Persist on attached Volume (ephemeral container disk is lost on kernel stop)\n",
+        "VOL = Path(\"/mnt/slm-finetune\")\n",
+        "OUT = VOL / \"lesson-lora-notebook\" if VOL.is_dir() else Path(\"./models/finetuned/minicpm5-1b-lesson-lora\")\n",
+        "OUT.mkdir(parents=True, exist_ok=True)\n",
+        "print(f\"Checkpoint dir: {OUT}\")"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "52e970ac"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Smoke fine-tune (LoRA, 20 steps)\n",
+        "\n",
+        "Uses the lesson-agent chat dataset by default."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "!uv run python research/finetune.py \\\n",
+        "  --preset minicpm5-1b \\\n",
+        "  --mode lora \\\n",
+        "  --dataset research/data/education-lesson-chat.jsonl \\\n",
+        "  --format chat \\\n",
+        "  --out {OUT} \\\n",
+        "  --max_steps 20"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "dfcb2f58"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Baseline lm-eval (smoke)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "!uv run --package slm-evals slm-lm-eval \\\n",
+        "  --config research/evals/configs/lm_eval_smoke.yaml \\\n",
+        "  --preset minicpm5-1b \\\n",
+        "  --experiment-name minicpm5-1b__notebook-baseline"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Post-train lm-eval (adapter)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "!uv run --package slm-evals slm-lm-eval \\\n",
+        "  --config research/evals/configs/lm_eval_smoke.yaml \\\n",
+        "  --model openbmb/MiniCPM5-1B \\\n",
+        "  --adapter {OUT} \\\n",
+        "  --experiment-name minicpm5-1b-lesson-lora__notebook \\\n",
+        "  --compare-to results/lm_eval/minicpm5-1b__notebook-baseline/results.json"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "d1a14c50"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Sample generation"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "import torch\n",
+        "from peft import PeftModel\n",
+        "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
+        "\n",
+        "base_id = \"openbmb/MiniCPM5-1B\"\n",
+        "adapter_dir = str(OUT)\n",
+        "\n",
+        "tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)\n",
+        "base = AutoModelForCausalLM.from_pretrained(\n",
+        "    base_id, torch_dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True\n",
+        ")\n",
+        "model = PeftModel.from_pretrained(base, adapter_dir)\n",
+        "model.eval()\n",
+        "\n",
+        "prompt = \"Explain photosynthesis in one short paragraph for a 10-year-old.\"\n",
+        "if tokenizer.chat_template:\n",
+        "    text = tokenizer.apply_chat_template(\n",
+        "        [{\"role\": \"user\", \"content\": prompt}],\n",
+        "        tokenize=False,\n",
+        "        add_generation_prompt=True,\n",
+        "    )\n",
+        "else:\n",
+        "    text = prompt\n",
+        "\n",
+        "ids = tokenizer(text, return_tensors=\"pt\").to(model.device)\n",
+        "out = model.generate(**ids, max_new_tokens=120, do_sample=True, temperature=0.7)\n",
+        "print(tokenizer.decode(out[0][ids[\"input_ids\"].shape[1]:], skip_special_tokens=True))"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "07706c76"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## After training\n",
+        "\n",
+        "- **Volume attached?** Use the Files panel (⬇) or run locally: `modal volume get slm-finetune lesson-lora-notebook ./models/finetuned/minicpm5-1b-lora`\n",
+        "- **Hub:** `huggingface-cli upload your-user/minicpm5-1b-lesson-lora <path-to-OUT> . --repo-type model`\n",
+        "- **Share notebook:** Share → public link → \"Can view and run\" for hackathon judges\n",
+        "\n",
+        "Full docs: `research/modal/README.md` in the repo."
+      ],
+      "id": "8cd6b7dd"
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.12.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}

uv.lock CHANGED Viewed

@@ -404,6 +404,46 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/d8/ef/e7e485ce5e4ba3843a0a92feb767c7b6098fd6e65ce752918074d175ae71/brotlicffi-1.2.0.1-cp38-abi3-win_amd64.whl", hash = "sha256:da2e82a08e7778b8bc539d27ca03cdd684113e81394bfaaad8d0dfc6a17ddede", size = 379026, upload-time = "2026-03-05T19:54:04.322Z" },
 ]
 [[package]]
 name = "certifi"
 version = "2026.5.20"
@@ -1174,6 +1214,19 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/28/27/3d6dcadc8a3214d8522c1e7f6a19554e33659be44546d44a2f7572ac7d2a/groovy-0.1.2-py3-none-any.whl", hash = "sha256:7f7975bab18c729a257a8b1ae9dcd70b7cafb1720481beae47719af57c35fa64", size = 14090, upload-time = "2025-02-28T20:24:55.152Z" },
 ]
 [[package]]
 name = "h11"
 version = "0.16.0"
@@ -1345,6 +1398,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/1e/5e/d4e9f1a599fb8e573b7b87160658329fbf28d19eac2718f51fc3def3aa5a/idna-3.18-py3-none-any.whl", hash = "sha256:7f952cbe720b688055e3f87de14f5c3e5fdaa8bc3928985c4077ca689de849a2", size = 65455, upload-time = "2026-06-02T14:34:06.319Z" },
 ]
 [[package]]
 name = "inference"
 version = "0.1.0"
@@ -1500,6 +1562,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/b5/91/53255615acd2a1eaca307ede3c90eb550bae9c94581f8c00081b6b1c8f44/kiwisolver-1.5.0-graalpy312-graalpy250_312_native-win_amd64.whl", hash = "sha256:1f1489f769582498610e015a8ef2d36f28f505ab3096d0e16b4858a9ec214f57", size = 75987, upload-time = "2026-03-09T13:15:39.65Z" },
 ]
 [[package]]
 name = "lazy-loader"
 version = "0.5"
@@ -1606,6 +1677,11 @@ hf = [
     { name = "torch" },
     { name = "transformers" },
 ]
 [[package]]
 name = "lxml"
@@ -1854,6 +1930,30 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979, upload-time = "2022-08-14T12:40:09.779Z" },
 ]
 [[package]]
 name = "more-itertools"
 version = "11.1.0"
@@ -2754,17 +2854,17 @@ wheels = [
 [[package]]
 name = "protobuf"
-version = "7.35.1"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/da/01/9ef0afd7999eb9badb3a768b4aedd78c86d4c65cfaf1958ab276199e76b4/protobuf-7.35.1.tar.gz", hash = "sha256:ce115a26fe0c39a2c29973d914d327e516a6455464489fe3cd1e51a1b354f81a", size = 458717, upload-time = "2026-06-11T21:55:40.257Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/10/03/8aeeb7458d22546bf64b5250ca1daeb5ff757d900e8e4a7476c6f0db843e/protobuf-7.35.1-cp310-abi3-macosx_10_9_universal2.whl", hash = "sha256:24f857477359a85c0c235261b8ba905fd51b2562f4a64ca1df5473f29850cbf6", size = 433226, upload-time = "2026-06-11T21:55:31.719Z" },
-    { url = "https://files.pythonhosted.org/packages/37/4b/dfb89eb0e652a1ff073c39a59fb5e3a83cfe9b57a2c83fa6d78270101767/protobuf-7.35.1-cp310-abi3-manylinux2014_aarch64.whl", hash = "sha256:11d6b0ec246892d85215b0a13ca6e0233cf5284b68f0ac02646427f4ff88a799", size = 328847, upload-time = "2026-06-11T21:55:34.035Z" },
-    { url = "https://files.pythonhosted.org/packages/0f/58/dc12f2cd484951524af6e3382c785869b9b3fb5e52ee95ae23add53ee8f9/protobuf-7.35.1-cp310-abi3-manylinux2014_s390x.whl", hash = "sha256:b73f9489a4b8b1c9cb1f8ed951c736392592edb24b9d6819f36d2e10b171d5b4", size = 344030, upload-time = "2026-06-11T21:55:34.941Z" },
-    { url = "https://files.pythonhosted.org/packages/e4/be/5b3cfe508bfab6761414ff944e3366eb13be4fd71efcd69450f89ba39f43/protobuf-7.35.1-cp310-abi3-manylinux2014_x86_64.whl", hash = "sha256:74758715c53d7158fb76caf4f0cfdacc5329a4b1bb994f865d6cf302d413a1c4", size = 327130, upload-time = "2026-06-11T21:55:35.921Z" },
-    { url = "https://files.pythonhosted.org/packages/d8/bc/6d6c7ba8709c85f8f2c390b2b118d6fb08a783676a572271851bf45a7d22/protobuf-7.35.1-cp310-abi3-win32.whl", hash = "sha256:353652e4efd0bca5b5fc2656abf8307ef351f0cf938c9eba09f0e09c20a25c30", size = 428945, upload-time = "2026-06-11T21:55:37.034Z" },
-    { url = "https://files.pythonhosted.org/packages/0a/19/8d0cb6f20a1ef7b18f1c8986ad5783f22f84cce39c6ce9a6e645ea55192e/protobuf-7.35.1-cp310-abi3-win_amd64.whl", hash = "sha256:230a75ddfc2de4806e56696ce9640c1cdfdb6543b7cfce98d42a4c0a0e7bdb87", size = 439996, upload-time = "2026-06-11T21:55:38.123Z" },
-    { url = "https://files.pythonhosted.org/packages/19/c7/5f7c636ec43e0c545e28d1f1db71990108306f7bdcb89f069ba97e428e7f/protobuf-7.35.1-py3-none-any.whl", hash = "sha256:4bc97768d8fe4ad6743c8a19403e314511ed9f6d13205b687e52421c023ac1b9", size = 171659, upload-time = "2026-06-11T21:55:39.155Z" },
 ]
 [[package]]
@@ -3581,7 +3681,7 @@ dependencies = [
 [package.optional-dependencies]
 lm-eval = [
-    { name = "lm-eval", extra = ["hf"] },
 ]
 [package.metadata]
@@ -3590,7 +3690,7 @@ requires-dist = [
     { name = "bitsandbytes", specifier = ">=0.43.0" },
     { name = "datasets", specifier = ">=2.19.0" },
     { name = "huggingface-hub", specifier = ">=0.22.0" },
-    { name = "lm-eval", extras = ["hf"], marker = "extra == 'lm-eval'", specifier = ">=0.4.9" },
     { name = "pandas", specifier = ">=2.0.0" },
     { name = "peft", specifier = ">=0.14.0" },
     { name = "pyyaml", specifier = ">=6.0" },
@@ -3628,6 +3728,10 @@ finetune = [
 lm-eval = [
     { name = "slm-evals", extra = ["lm-eval"] },
 ]
 [package.metadata]
 requires-dist = [
@@ -3650,6 +3754,10 @@ finetune = [
     { name = "peft", specifier = ">=0.14.0" },
 ]
 lm-eval = [{ name = "slm-evals", extras = ["lm-eval"], editable = "research/evals" }]
 [[package]]
 name = "socksio"
@@ -3792,6 +3900,18 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" },
 ]
 [[package]]
 name = "tabledata"
 version = "1.3.5"
@@ -3867,6 +3987,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/72/f4/0de46cfa12cdcbcd464cc59fde36912af405696f687e53a091fb432f694c/tokenizers-0.22.2-cp39-abi3-win_arm64.whl", hash = "sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc", size = 2612133, upload-time = "2026-01-05T10:45:17.232Z" },
 ]
 [[package]]
 name = "tomlkit"
 version = "0.14.0"
@@ -4053,6 +4182,24 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl", hash = "sha256:75caa44ed46a03fb2dab8808753ffacdbfea88495e74c85a28c5eefcf5f39c89", size = 58409, upload-time = "2026-04-30T19:32:18.271Z" },
 ]
 [[package]]
 name = "typing-extensions"
 version = "4.15.0"
@@ -4117,6 +4264,92 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/88/fa/e1388bbcf24ef3274f45c0c1c7b501fd14971037c1b6ee23610553307497/uvicorn-0.49.0-py3-none-any.whl", hash = "sha256:ba3d14c3ee7e41c6c654c46c9eb489d33213cdd30aa1696eab1374337c13f68f", size = 71376, upload-time = "2026-06-03T22:01:29.037Z" },
 ]
 [[package]]
 name = "word2number"
 version = "1.1"

     { url = "https://files.pythonhosted.org/packages/d8/ef/e7e485ce5e4ba3843a0a92feb767c7b6098fd6e65ce752918074d175ae71/brotlicffi-1.2.0.1-cp38-abi3-win_amd64.whl", hash = "sha256:da2e82a08e7778b8bc539d27ca03cdd684113e81394bfaaad8d0dfc6a17ddede", size = 379026, upload-time = "2026-03-05T19:54:04.322Z" },
 ]
+[[package]]
+name = "cbor2"
+version = "6.1.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/75/af/473c241e41c142ea06ebef8d1f660fa6ff928fb97210e7bec8ee5974f8cd/cbor2-6.1.2.tar.gz", hash = "sha256:6b43037a66947dee5af0abb1a4c3a13b3abac5a4a3f32f9771efbbcd030fd909", size = 86760, upload-time = "2026-06-02T19:01:29.333Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5e/0c/a857b6ca032282b564cf25de18ad92fe0614e8b3fa3422eb10e32a873939/cbor2-6.1.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:92b158d3ff9d9dce70eeb09786a6e518e3cb0ecb927fd23e9a0f7fc4b175c01a", size = 409592, upload-time = "2026-06-02T19:00:44.556Z" },
+    { url = "https://files.pythonhosted.org/packages/29/db/e0518153b3228159d9373f3b5785d7ea2d68898e27ee1bce7d03f0b5f7aa/cbor2-6.1.2-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:d29a11044b07048e19f39a87fe8fea7ea865eb0ace50dc4c29513d52d40e2ddf", size = 454598, upload-time = "2026-06-02T19:00:45.784Z" },
+    { url = "https://files.pythonhosted.org/packages/29/67/62127b22edc6011ba55b76a28ab7c2219a45d01871a8199532e0978b26d1/cbor2-6.1.2-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:a106f174eda34d8937a621c7f3e6044586cb209170cdc8da0ffbea89d1d6e385", size = 467380, upload-time = "2026-06-02T19:00:47.196Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/95/7992d8ec904c116ad547abb4960cc3fde695d5853c66596b1465d14d2f7b/cbor2-6.1.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2ea16a25cc457a92879ff7a36cc50b587bddba09d8176bf1a94803eec5aa27eb", size = 521672, upload-time = "2026-06-02T19:00:48.656Z" },
+    { url = "https://files.pythonhosted.org/packages/cb/cf/80cc4be132a523f0c92fb4c71813577bb393abea9e27990ca74605e0e930/cbor2-6.1.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:2652a94224980d47f2a3866dd35b1afe532ecdfaf91f8cfcec39a026c457a844", size = 534402, upload-time = "2026-06-02T19:00:50.064Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/ea/99e466d8bef61a0775a1d8538ae6c9d95f4533fadc01f8f7814cb7ab80ad/cbor2-6.1.2-cp312-cp312-win32.whl", hash = "sha256:618666292900487db4a5abcade3150105c9c9fdd22576e6ff297c9a72eef0c6a", size = 283225, upload-time = "2026-06-02T19:00:51.406Z" },
+    { url = "https://files.pythonhosted.org/packages/14/13/e6a677bdc499e43049006cb54fe605b0f7aef621402d31354cc42ef293c9/cbor2-6.1.2-cp312-cp312-win_amd64.whl", hash = "sha256:c61c0b2e2cee64497e6c62d1976bc212f62ac0cd2b5b903613610d79b8b06b60", size = 300844, upload-time = "2026-06-02T19:00:52.628Z" },
+    { url = "https://files.pythonhosted.org/packages/77/4a/08bd8461f8e2e1ce1de5ae2768f2b7ca39a090e3156c1ee0d9b5fd86e70d/cbor2-6.1.2-cp312-cp312-win_arm64.whl", hash = "sha256:c871e7266ddc545b258e6f8e5300396985dc485d7ccf8bb4777385782f302153", size = 289040, upload-time = "2026-06-02T19:00:53.971Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/dc/bc045c8f36317e4e5f7a60d94b36833139909fc32e3a65f44bc61a36def0/cbor2-6.1.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f1aa38c422d87ea61849b2a823b10b64053fb4da8763f19ac78ea9a69d682b2a", size = 408846, upload-time = "2026-06-02T19:00:55.476Z" },
+    { url = "https://files.pythonhosted.org/packages/2b/36/d66f5f0dd98ecbdcfc7da1fbd423f7b3782a27719f0062a560476f00b334/cbor2-6.1.2-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:ff7d0bd8ff432832338a8d2430aee34f8a082342480ff537c0ba90e2b8ff7894", size = 454624, upload-time = "2026-06-02T19:00:56.744Z" },
+    { url = "https://files.pythonhosted.org/packages/38/6b/4884b9cf03db14dc5007825d5d1bf8678a75c49d4268d8e0c1c6e9580104/cbor2-6.1.2-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:c1eedf3290d88a5f663bd8b4b8f0f0e2103d0594c293fa5f4e62e53100972309", size = 466585, upload-time = "2026-06-02T19:00:58.209Z" },
+    { url = "https://files.pythonhosted.org/packages/50/f6/36a15beb3915f56a79d6e9213c6d40c0f5cb90cd3462923f555d78068847/cbor2-6.1.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:3049b04bddf9a5a2d0e5bb25dccdaf4552fcaf607b404e249d4f78f010fcc7d0", size = 521678, upload-time = "2026-06-02T19:00:59.524Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/3f/e899313371ebeb7a191d751de97ccd8242abc24bbc9d8e2c58e04475cfb0/cbor2-6.1.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:96eb687a62040401668f06a85de8f47361ef44574de1493899e0ec678109fc04", size = 534044, upload-time = "2026-06-02T19:01:00.875Z" },
+    { url = "https://files.pythonhosted.org/packages/1e/5e/1a872acdeb1ab9a884ec3460f73a43e02154dc20d8ccb627bbd60f4c0ea1/cbor2-6.1.2-cp313-cp313-win32.whl", hash = "sha256:03440b505882280023db1fedcee6844804e9968bb50f9eb4ff12aaf27777fcfe", size = 282328, upload-time = "2026-06-02T19:01:02.347Z" },
+    { url = "https://files.pythonhosted.org/packages/70/79/29721bc15d38889e7bec214ede2346ee15970bedcc5e6ce1fa30f21e9a4e/cbor2-6.1.2-cp313-cp313-win_amd64.whl", hash = "sha256:d2c8da2c0f821827dcc9eb59a5c9351791a8aa3b389a2ea7ca64c4f97bcb94cf", size = 300313, upload-time = "2026-06-02T19:01:03.69Z" },
+    { url = "https://files.pythonhosted.org/packages/07/98/a13b424fb2f14fe332b57f71f479953b2f291a051f797d42ddab9fcd2027/cbor2-6.1.2-cp313-cp313-win_arm64.whl", hash = "sha256:8e1478d3b980ddfcaf56e27cecbfe13057e0f67d5e8240fe8a398815acb9c4bf", size = 288725, upload-time = "2026-06-02T19:01:04.933Z" },
+    { url = "https://files.pythonhosted.org/packages/62/72/949bdc7422acd868a2355ae032561a104973fb5de284b36a237b85780dc9/cbor2-6.1.2-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:b0b65314a0b18c47651e17792447171a858dd77e3f161c451ad850d63f8718a9", size = 407436, upload-time = "2026-06-02T19:01:06.259Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/bd/5969f9263102d1c15aa370b39802e4a87b1d1703fdb51588daf38b5fbe7e/cbor2-6.1.2-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:8904deb2849bae40cea970e114398a19da371e1048ae1409e64f167a1205daf6", size = 453507, upload-time = "2026-06-02T19:01:07.795Z" },
+    { url = "https://files.pythonhosted.org/packages/93/a5/227b785692a8374e3dbdf1fe76d1a9af48239855abd68a4111a1458fd81b/cbor2-6.1.2-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:b29d58d8ce00535354d873df170a3e9f0f0a02af65d12102d2552e2129c65dc8", size = 464875, upload-time = "2026-06-02T19:01:09.222Z" },
+    { url = "https://files.pythonhosted.org/packages/6d/48/a06527c3fbed4c32816abba4540e432fe9cd7e739a37fef0f205bd0f1e44/cbor2-6.1.2-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:27be1cc0abc42f154a48a315c92feb2bfb50397e51c70860460438ea172198a5", size = 519940, upload-time = "2026-06-02T19:01:10.795Z" },
+    { url = "https://files.pythonhosted.org/packages/31/1b/0e3f0dac7140d4b94ffbcef765fa4cce0caa1d942060101149de998fa7be/cbor2-6.1.2-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b8d87fb8a33ff1971cb01511e74b044767cbba1ba536d3dc0b0c48f0d1b62237", size = 532612, upload-time = "2026-06-02T19:01:12.363Z" },
+    { url = "https://files.pythonhosted.org/packages/35/2f/5af245e7667b65c6e4a714bb5d89c84de5573b857eba9137533d54bc2e4f/cbor2-6.1.2-cp314-cp314-win32.whl", hash = "sha256:72ba0ea913ca1a8d916867f1b7d414f140982d2873e5d92f8f51de437e08979e", size = 285886, upload-time = "2026-06-02T19:01:13.658Z" },
+    { url = "https://files.pythonhosted.org/packages/d9/0a/6303f3e19730450c5a82b97cd2c0ed54855f9108502041305b4c641116cd/cbor2-6.1.2-cp314-cp314-win_amd64.whl", hash = "sha256:c02b7d94fe9914798a346a2f089f0f7f85be71d120d40080916d131fa0bd0442", size = 308808, upload-time = "2026-06-02T19:01:14.944Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/61/48f9c5545223dad9d2ea2061a76da739b4047a461297b621fc80ce0f65c0/cbor2-6.1.2-cp314-cp314-win_arm64.whl", hash = "sha256:2af1309865000c401755fd4fdd5550f74ac34c3f79eb7db15f3956714769a5a9", size = 299522, upload-time = "2026-06-02T19:01:16.393Z" },
+    { url = "https://files.pythonhosted.org/packages/b2/2b/efcc6578b4e6142fb8ec9212c0dee5030345db2092f26aa960236067e717/cbor2-6.1.2-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:9f26e08dd78ee77d103065543a65cfb838948fa8735180ad4d81d939950a1420", size = 402925, upload-time = "2026-06-02T19:01:17.979Z" },
+    { url = "https://files.pythonhosted.org/packages/58/f6/58c86aa6246b3e7de473d8ff79ac8cc986e95cafe208899a70d6916012d7/cbor2-6.1.2-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:596e238f24bf9ede11a1ad08d2115fe78105ed6dda42ce1dd35872e7e91974fd", size = 446201, upload-time = "2026-06-02T19:01:19.481Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/12/3b90820583e9860e35cb5e91f3b2cd2ab1bbdf1c57fc63aa572952f5f75f/cbor2-6.1.2-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:08a62f69fe0f0ee1428d901423853b56bb5c775430f798401f8fac4b9affdecc", size = 460193, upload-time = "2026-06-02T19:01:20.876Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/88/c1e841ffb39a8e7163d7d432f7ea0e59b812c5134a449c75b6b8eb8aad08/cbor2-6.1.2-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:6ca0080e4d8ab0d67c0518ac995d03151a1274b5c295c9e619fb6057c91ae49e", size = 511446, upload-time = "2026-06-02T19:01:22.18Z" },
+    { url = "https://files.pythonhosted.org/packages/db/0a/f1ede587a388f127b9fc3d8ecb2f5d948654fed9fc7698f8b05fd90986bf/cbor2-6.1.2-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:b44eb2f3ea1c8d9cb3e39c345204ec4d9489f8149b78eb5e058b13b14a8c7b07", size = 527683, upload-time = "2026-06-02T19:01:23.639Z" },
+    { url = "https://files.pythonhosted.org/packages/1a/89/e3210ea45855a8d6173821f712a71a90d23dea0c134c4017c6f666a04fdf/cbor2-6.1.2-cp314-cp314t-win32.whl", hash = "sha256:f93179b4b1ba958b5c37b56969b8f07b4fcf44a83319f47559c59f28a1c564a4", size = 280419, upload-time = "2026-06-02T19:01:25.365Z" },
+    { url = "https://files.pythonhosted.org/packages/96/84/b555de26cc01108a72ed1df8eb7ca1d63495a3727045f0f93318dc5f99a8/cbor2-6.1.2-cp314-cp314t-win_amd64.whl", hash = "sha256:3c6c3d6598c268abf7068ae75b23b19f708e7a4aa294341b356deb65cb2664f1", size = 302514, upload-time = "2026-06-02T19:01:26.782Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/6e/5556939414c0d2bffed7c7a53cf2b32181b55a795944d19835d513a7bc88/cbor2-6.1.2-cp314-cp314t-win_arm64.whl", hash = "sha256:8c2202fd1906f978bff3f97b21351815753dd9a8fcf4612a5113b6b257089059", size = 290058, upload-time = "2026-06-02T19:01:28.077Z" },
+]
 [[package]]
 name = "certifi"
 version = "2026.5.20"
     { url = "https://files.pythonhosted.org/packages/28/27/3d6dcadc8a3214d8522c1e7f6a19554e33659be44546d44a2f7572ac7d2a/groovy-0.1.2-py3-none-any.whl", hash = "sha256:7f7975bab18c729a257a8b1ae9dcd70b7cafb1720481beae47719af57c35fa64", size = 14090, upload-time = "2025-02-28T20:24:55.152Z" },
 ]
+[[package]]
+name = "grpclib"
+version = "0.4.9"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "h2" },
+    { name = "multidict" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/5b/28/5a2c299ec82a876a252c5919aa895a6f1d1d35c96417c5ce4a4660dc3a80/grpclib-0.4.9.tar.gz", hash = "sha256:cc589c330fa81004c6400a52a566407574498cb5b055fa927013361e21466c46", size = 84798, upload-time = "2025-12-14T22:23:14.349Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5c/90/b0cbbd9efcc82816c58f31a34963071aa19fb792a212a5d9caf8e0fc3097/grpclib-0.4.9-py3-none-any.whl", hash = "sha256:7762ec1c8ed94dfad597475152dd35cbd11aecaaca2f243e29702435ca24cf0e", size = 77063, upload-time = "2025-12-14T22:23:13.224Z" },
+]
 [[package]]
 name = "h11"
 version = "0.16.0"
     { url = "https://files.pythonhosted.org/packages/1e/5e/d4e9f1a599fb8e573b7b87160658329fbf28d19eac2718f51fc3def3aa5a/idna-3.18-py3-none-any.whl", hash = "sha256:7f952cbe720b688055e3f87de14f5c3e5fdaa8bc3928985c4077ca689de849a2", size = 65455, upload-time = "2026-06-02T14:34:06.319Z" },
 ]
+[[package]]
+name = "immutabledict"
+version = "4.3.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1d/e6/718471048fea0366c3e3d1df3acfd914ca66d571cdffcf6d37bbcd725708/immutabledict-4.3.1.tar.gz", hash = "sha256:f844a669106cfdc73f47b1a9da003782fb17dc955a54c80972e0d93d1c63c514", size = 7806, upload-time = "2026-02-15T10:32:34.668Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a3/ce/f9018bf69ae91b273b6391a095e7c93fa5e1617f25b6ba81ad4b20c9df10/immutabledict-4.3.1-py3-none-any.whl", hash = "sha256:c9facdc0ff30fdb8e35bd16532026cac472a549e182c94fa201b51b25e4bf7bf", size = 5000, upload-time = "2026-02-15T10:32:33.672Z" },
+]
 [[package]]
 name = "inference"
 version = "0.1.0"
     { url = "https://files.pythonhosted.org/packages/b5/91/53255615acd2a1eaca307ede3c90eb550bae9c94581f8c00081b6b1c8f44/kiwisolver-1.5.0-graalpy312-graalpy250_312_native-win_amd64.whl", hash = "sha256:1f1489f769582498610e015a8ef2d36f28f505ab3096d0e16b4858a9ec214f57", size = 75987, upload-time = "2026-03-09T13:15:39.65Z" },
 ]
+[[package]]
+name = "langdetect"
+version = "1.0.9"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "six" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/0e/72/a3add0e4eec4eb9e2569554f7c70f4a3c27712f40e3284d483e88094cc0e/langdetect-1.0.9.tar.gz", hash = "sha256:cbc1fef89f8d062739774bd51eda3da3274006b3661d199c2655f6b3f6d605a0", size = 981474, upload-time = "2021-05-07T07:54:13.562Z" }
 [[package]]
 name = "lazy-loader"
 version = "0.5"
     { name = "torch" },
     { name = "transformers" },
 ]
+ifeval = [
+    { name = "immutabledict" },
+    { name = "langdetect" },
+    { name = "nltk" },
+]
 [[package]]
 name = "lxml"
     { url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979, upload-time = "2022-08-14T12:40:09.779Z" },
 ]
+[[package]]
+name = "modal"
+version = "1.5.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "aiohttp" },
+    { name = "cbor2" },
+    { name = "certifi" },
+    { name = "click" },
+    { name = "grpclib" },
+    { name = "protobuf" },
+    { name = "rich" },
+    { name = "synchronicity" },
+    { name = "toml" },
+    { name = "types-certifi" },
+    { name = "types-toml" },
+    { name = "typing-extensions" },
+    { name = "watchfiles" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/59/f9/87425e60db2a8597b248417772b409c49ca3a05ff6b1282a21cd7d856f09/modal-1.5.0.tar.gz", hash = "sha256:15033cf84f5f4f9f8a3dcf47a768cfcca36d1ad38ab7b3459fd3cbc29aa84a77", size = 771722, upload-time = "2026-06-09T22:37:27.5Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5c/71/85e476e7d32c0a648d5aa97c4335ac02357d059c2bb734cf175b08446597/modal-1.5.0-py3-none-any.whl", hash = "sha256:9c5687eff775d1372bd70b87e43499e40777a1de160f23786c00807bf342fcb6", size = 882122, upload-time = "2026-06-09T22:37:24.608Z" },
+]
 [[package]]
 name = "more-itertools"
 version = "11.1.0"
 [[package]]
 name = "protobuf"
+version = "6.33.6"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/66/70/e908e9c5e52ef7c3a6c7902c9dfbb34c7e29c25d2f81ade3856445fd5c94/protobuf-6.33.6.tar.gz", hash = "sha256:a6768d25248312c297558af96a9f9c929e8c4cee0659cb07e780731095f38135", size = 444531, upload-time = "2026-03-18T19:05:00.988Z" }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/fc/9f/2f509339e89cfa6f6a4c4ff50438db9ca488dec341f7e454adad60150b00/protobuf-6.33.6-cp310-abi3-win32.whl", hash = "sha256:7d29d9b65f8afef196f8334e80d6bc1d5d4adedb449971fefd3723824e6e77d3", size = 425739, upload-time = "2026-03-18T19:04:48.373Z" },
+    { url = "https://files.pythonhosted.org/packages/76/5d/683efcd4798e0030c1bab27374fd13a89f7c2515fb1f3123efdfaa5eab57/protobuf-6.33.6-cp310-abi3-win_amd64.whl", hash = "sha256:0cd27b587afca21b7cfa59a74dcbd48a50f0a6400cfb59391340ad729d91d326", size = 437089, upload-time = "2026-03-18T19:04:50.381Z" },
+    { url = "https://files.pythonhosted.org/packages/5c/01/a3c3ed5cd186f39e7880f8303cc51385a198a81469d53d0fdecf1f64d929/protobuf-6.33.6-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:9720e6961b251bde64edfdab7d500725a2af5280f3f4c87e57c0208376aa8c3a", size = 427737, upload-time = "2026-03-18T19:04:51.866Z" },
+    { url = "https://files.pythonhosted.org/packages/ee/90/b3c01fdec7d2f627b3a6884243ba328c1217ed2d978def5c12dc50d328a3/protobuf-6.33.6-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:e2afbae9b8e1825e3529f88d514754e094278bb95eadc0e199751cdd9a2e82a2", size = 324610, upload-time = "2026-03-18T19:04:53.096Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/ca/25afc144934014700c52e05103c2421997482d561f3101ff352e1292fb81/protobuf-6.33.6-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:c96c37eec15086b79762ed265d59ab204dabc53056e3443e702d2681f4b39ce3", size = 339381, upload-time = "2026-03-18T19:04:54.616Z" },
+    { url = "https://files.pythonhosted.org/packages/16/92/d1e32e3e0d894fe00b15ce28ad4944ab692713f2e7f0a99787405e43533a/protobuf-6.33.6-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e9db7e292e0ab79dd108d7f1a94fe31601ce1ee3f7b79e0692043423020b0593", size = 323436, upload-time = "2026-03-18T19:04:55.768Z" },
+    { url = "https://files.pythonhosted.org/packages/c4/72/02445137af02769918a93807b2b7890047c32bfb9f90371cbc12688819eb/protobuf-6.33.6-py3-none-any.whl", hash = "sha256:77179e006c476e69bf8e8ce866640091ec42e1beb80b213c3900006ecfba6901", size = 170656, upload-time = "2026-03-18T19:04:59.826Z" },
 ]
 [[package]]
 [package.optional-dependencies]
 lm-eval = [
+    { name = "lm-eval", extra = ["hf", "ifeval"] },
 ]
 [package.metadata]
     { name = "bitsandbytes", specifier = ">=0.43.0" },
     { name = "datasets", specifier = ">=2.19.0" },
     { name = "huggingface-hub", specifier = ">=0.22.0" },
+    { name = "lm-eval", extras = ["hf", "ifeval"], marker = "extra == 'lm-eval'", specifier = ">=0.4.9" },
     { name = "pandas", specifier = ">=2.0.0" },
     { name = "peft", specifier = ">=0.14.0" },
     { name = "pyyaml", specifier = ">=6.0" },
 lm-eval = [
     { name = "slm-evals", extra = ["lm-eval"] },
 ]
+modal = [
+    { name = "modal" },
+    { name = "pyyaml" },
+]
 [package.metadata]
 requires-dist = [
     { name = "peft", specifier = ">=0.14.0" },
 ]
 lm-eval = [{ name = "slm-evals", extras = ["lm-eval"], editable = "research/evals" }]
+modal = [
+    { name = "modal", specifier = ">=0.73.0" },
+    { name = "pyyaml", specifier = ">=6.0" },
+]
 [[package]]
 name = "socksio"
     { url = "https://files.pythonhosted.org/packages/a2/09/77d55d46fd61b4a135c444fc97158ef34a095e5681d0a6c10b75bf356191/sympy-1.14.0-py3-none-any.whl", hash = "sha256:e091cc3e99d2141a0ba2847328f5479b05d94a6635cb96148ccb3f34671bd8f5", size = 6299353, upload-time = "2025-04-27T18:04:59.103Z" },
 ]
+[[package]]
+name = "synchronicity"
+version = "0.12.3"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ec/d5/e96e6082790c92480380f28aa53e111844cdac7b0f75846f4772cb535a43/synchronicity-0.12.3.tar.gz", hash = "sha256:0d4228b85eaf2805f23b4615b2039a9d24ea811646e2d9f8d0c033094eb85841", size = 60261, upload-time = "2026-05-28T12:33:50.206Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/57/ea/531a6ea751cbd989da386144810b1b8f529b0aae8c1a9beda8b40966c9c2/synchronicity-0.12.3-py3-none-any.whl", hash = "sha256:e476818cd14102136f41622c619de548f0000c024485fc18521c8fe908ea7574", size = 40982, upload-time = "2026-05-28T12:33:49.125Z" },
+]
 [[package]]
 name = "tabledata"
 version = "1.3.5"
     { url = "https://files.pythonhosted.org/packages/72/f4/0de46cfa12cdcbcd464cc59fde36912af405696f687e53a091fb432f694c/tokenizers-0.22.2-cp39-abi3-win_arm64.whl", hash = "sha256:9ce725d22864a1e965217204946f830c37876eee3b2ba6fc6255e8e903d5fcbc", size = 2612133, upload-time = "2026-01-05T10:45:17.232Z" },
 ]
+[[package]]
+name = "toml"
+version = "0.10.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/be/ba/1f744cdc819428fc6b5084ec34d9b30660f6f9daaf70eead706e3203ec3c/toml-0.10.2.tar.gz", hash = "sha256:b3bda1d108d5dd99f4a20d24d9c348e91c4db7ab1b749200bded2f839ccbe68f", size = 22253, upload-time = "2020-11-01T01:40:22.204Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/44/6f/7120676b6d73228c96e17f1f794d8ab046fc910d781c8d151120c3f1569e/toml-0.10.2-py2.py3-none-any.whl", hash = "sha256:806143ae5bfb6a3c6e736a764057db0e6a0e05e338b5630894a5f779cabb4f9b", size = 16588, upload-time = "2020-11-01T01:40:20.672Z" },
+]
 [[package]]
 name = "tomlkit"
 version = "0.14.0"
     { url = "https://files.pythonhosted.org/packages/3f/f9/2b3ff4e56e5fa7debfaf9eb135d0da96f3e9a1d5b27222223c7296336e5f/typer-0.25.1-py3-none-any.whl", hash = "sha256:75caa44ed46a03fb2dab8808753ffacdbfea88495e74c85a28c5eefcf5f39c89", size = 58409, upload-time = "2026-04-30T19:32:18.271Z" },
 ]
+[[package]]
+name = "types-certifi"
+version = "2021.10.8.3"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/52/68/943c3aeaf14624712a0357c4a67814dba5cea36d194f5c764dad7959a00c/types-certifi-2021.10.8.3.tar.gz", hash = "sha256:72cf7798d165bc0b76e1c10dd1ea3097c7063c42c21d664523b928e88b554a4f", size = 2095, upload-time = "2022-06-09T15:19:05.244Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b5/63/2463d89481e811f007b0e1cd0a91e52e141b47f9de724d20db7b861dcfec/types_certifi-2021.10.8.3-py3-none-any.whl", hash = "sha256:b2d1e325e69f71f7c78e5943d410e650b4707bb0ef32e4ddf3da37f54176e88a", size = 2136, upload-time = "2022-06-09T15:19:03.127Z" },
+]
+[[package]]
+name = "types-toml"
+version = "0.10.8.20260518"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/4b/11/6ece999e91f2ccb848ab4420f3f4816e78ac0541f739e6864affdaaa5737/types_toml-0.10.8.20260518.tar.gz", hash = "sha256:80e10facd24fdeda9d5c672187d72be3ac284843788d67f5aae59e3e016db6fe", size = 9419, upload-time = "2026-05-18T06:02:16.719Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/91/25/489751806bf5c95e4007f8e17409199c54d31e49ffbea07c5729b1286c8e/types_toml-0.10.8.20260518-py3-none-any.whl", hash = "sha256:0e564ab05f6fde62a315b3b5a9b6624fda569399795d30a37e64705a70459303", size = 9669, upload-time = "2026-05-18T06:02:15.86Z" },
+]
 [[package]]
 name = "typing-extensions"
 version = "4.15.0"
     { url = "https://files.pythonhosted.org/packages/88/fa/e1388bbcf24ef3274f45c0c1c7b501fd14971037c1b6ee23610553307497/uvicorn-0.49.0-py3-none-any.whl", hash = "sha256:ba3d14c3ee7e41c6c654c46c9eb489d33213cdd30aa1696eab1374337c13f68f", size = 71376, upload-time = "2026-06-03T22:01:29.037Z" },
 ]
+[[package]]
+name = "watchfiles"
+version = "1.2.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "anyio" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/cd/41/5e1a4bb12aac5f1493fa1bdc11154eca3b258ca4eba65d39c473fe19d8e9/watchfiles-1.2.0.tar.gz", hash = "sha256:c995fba777f1ea992f090f9236e9284cf7a5d1a0130dd5a3d82c598cacd76838", size = 108252, upload-time = "2026-05-18T04:32:04.251Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/b8/2f/e42c992d2afda3108ea1c02acecc991b9f31d05c14adc2a7cee9ee211fc4/watchfiles-1.2.0-cp312-cp312-macosx_10_12_x86_64.whl", hash = "sha256:bc13eb17538be00c874699dc0abe4ee2bc8d50bb1166a6b9e175ef3fd7eb8f26", size = 400115, upload-time = "2026-05-18T04:32:02.06Z" },
+    { url = "https://files.pythonhosted.org/packages/5f/8f/6af2ea19065c91d8b0ea3516fdfc8c0d349f407e8e9fbf4e5a17360de8ad/watchfiles-1.2.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:2d95ddc1eb6914154253d239089900813f6a767e174b8e6a50e7fdacb7e4236c", size = 393659, upload-time = "2026-05-18T04:30:50.951Z" },
+    { url = "https://files.pythonhosted.org/packages/13/01/b32a967c56fb3e3e5be3db52c3d3b87fa4513aa367d8ed1ad96d42952e5f/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:8f70d8b291ef6e88d19b1f297a6905ddb978888d9272b0d05e6f53309856bcfc", size = 453207, upload-time = "2026-05-18T04:31:04.231Z" },
+    { url = "https://files.pythonhosted.org/packages/04/98/97557a812180338cb1abd32e1cffcc4588f59b5f23e0cb006b2ba95ba64a/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:56d8641cf834c2836922899105bd3ce3d0dfc69291d52edf0b4d0436829b34c0", size = 459273, upload-time = "2026-05-18T04:31:50.377Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/a8/b4b08dcb7653b8087c6586f7ce649505900e866bbcfe40dc9587af02e686/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:2581a94056e55d7d0a31a823ea92bf73749c489ca2285bfdc0fbe6b2bb49d50c", size = 489927, upload-time = "2026-05-18T04:31:42.485Z" },
+    { url = "https://files.pythonhosted.org/packages/50/94/3dceea03545d2e5ddfd839f0ddd5e1cecbf1697b5a428d5ba11cef6af95d/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:41bc1199f7523b3f82843c88cbb979180c949caef0342cf90968f178e5d49b01", size = 570476, upload-time = "2026-05-18T04:31:03.071Z" },
+    { url = "https://files.pythonhosted.org/packages/cc/f2/d39a5450c3532092b91f81d274360e613c2371bc874a89c7a1a3c5e8d138/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:7571e4464cb6e434958f867f7f730b8ab0b75e3f8e5eac0499168486ab3c33a8", size = 465650, upload-time = "2026-05-18T04:30:12.701Z" },
+    { url = "https://files.pythonhosted.org/packages/22/24/ed72f68cbc1333ca9b9f2200aa048bb6658ae41709bc1caad4310f4bdffd/watchfiles-1.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e53a384f76b631c3ae5334ce6a52f0baa3a911eb94a4eac7f160079868b716d5", size = 456398, upload-time = "2026-05-18T04:30:13.784Z" },
+    { url = "https://files.pythonhosted.org/packages/0d/64/982ef4a4e5bab5b6e5b6becc8cd5e732f6130a78b855f0abec6439a9a135/watchfiles-1.2.0-cp312-cp312-manylinux_2_31_riscv64.whl", hash = "sha256:d20029a60a71a052a24c4db7673bc4de39ab89adbaccbfb5d67987c5d73f424d", size = 465140, upload-time = "2026-05-18T04:31:52.111Z" },
+    { url = "https://files.pythonhosted.org/packages/a0/0c/95282abf4ed680b6096010bcfc30c5fa7a041fc5aa5a2ad17a2cc6c75bba/watchfiles-1.2.0-cp312-cp312-musllinux_1_1_aarch64.whl", hash = "sha256:2cb93af48550faf1cea04c303107c8b75833de7013e57ce27d3b8d21d8d0f58c", size = 630259, upload-time = "2026-05-18T04:31:25.676Z" },
+    { url = "https://files.pythonhosted.org/packages/30/45/607c1de1530c4bdcf2cf1d1ecc2505ddba5d96bd43ba9f2b0e79876f850f/watchfiles-1.2.0-cp312-cp312-musllinux_1_1_x86_64.whl", hash = "sha256:2995c176de7692b86a2e4c58d9ec718f753150a979cb4a754e2b4ffa38e70906", size = 659859, upload-time = "2026-05-18T04:30:24.333Z" },
+    { url = "https://files.pythonhosted.org/packages/fa/08/d9e2e0f9e8e6791d33aefc694ad7eefa7f901f63caff84a81ded38692f9c/watchfiles-1.2.0-cp312-cp312-win32.whl", hash = "sha256:7a2cffd17d27d2ecbb310c2b1d8174f222a5495b1a721894afa88ec11e25b898", size = 275480, upload-time = "2026-05-18T04:30:31.307Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/e6/9d42569c0102645cc8cea5d8c7d8a1e9d4ada2cb7f05f75e554b8aa2202a/watchfiles-1.2.0-cp312-cp312-win_amd64.whl", hash = "sha256:f155b3a1b2a5fc89cdc70d47ee5d54e3b75e88efa34982028a35daef9ba00379", size = 288718, upload-time = "2026-05-18T04:32:10.745Z" },
+    { url = "https://files.pythonhosted.org/packages/0a/26/88e0dc6ee3898169d7fa22bb6a69cabf2502d2ee25cb8c876d1262d204f8/watchfiles-1.2.0-cp312-cp312-win_arm64.whl", hash = "sha256:8fa585ede612ee9f9e91b18bebf9ba11b9ae29a4e3a0d0cf6fca3e382133f0d5", size = 281026, upload-time = "2026-05-18T04:30:22.23Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/4d/70a7feced9f87e2ff26dba42667290f41694fc64646c67261fbb8cab5d5c/watchfiles-1.2.0-cp313-cp313-macosx_10_12_x86_64.whl", hash = "sha256:01ea8d66f0693b9b60a6541c8d10263091ca9a9060d242f3c1f3143f9aad2c98", size = 399730, upload-time = "2026-05-18T04:31:38.162Z" },
+    { url = "https://files.pythonhosted.org/packages/31/3a/0da302f2307aee316922806ebd5726c542cbd787c938271cf14a074c7daf/watchfiles-1.2.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:7ba0480b9a74af058f43b337e937a451e109295c420916d68ad24e3dc02f5e44", size = 392842, upload-time = "2026-05-18T04:30:27.051Z" },
+    { url = "https://files.pythonhosted.org/packages/db/ef/d5bdb705c224dbc256aa0c1ec47bf4e61ec52558f2afb44a71a1fe4d7015/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4f34e26a19f91f710c08e0183429f0d1d15df734e6bc78c31e77b9ea9c433658", size = 452989, upload-time = "2026-05-18T04:31:11.945Z" },
+    { url = "https://files.pythonhosted.org/packages/71/29/5495f2c1661949ef7a35e4d71111d129cfe7606414a26887a919d0a55406/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b4e77f6a55f858504069abd35d336a637555c09bca453dde1ee1e5ada8a6a1fb", size = 458978, upload-time = "2026-05-18T04:30:52.606Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/8c/7f9c07c433811c2fffd93e13fdfb7135de9aab5f2ae41be08960fa0047dc/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0cb4d80e212f116474a545c21c912b445f16bb0cef9e6a73a498164223e14e2f", size = 490248, upload-time = "2026-05-18T04:31:36.003Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/11/d93632febc52fbc21be90231bb7c17fd5387f46c9076fd40a5f9c2ae6910/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:b974946a10af379d425e2eef5b62f5c6ebeaccf91d45eaad6f5b27ecd4f91aa0", size = 571847, upload-time = "2026-05-18T04:31:10.862Z" },
+    { url = "https://files.pythonhosted.org/packages/55/b4/383173e73aabb07ad1d9c7aa859d95437ac46a6d6a1e11005facda0c9d19/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:86bc13c25a8d1fcd70b51d0ce7c9b65e90de5666fcbfd3e34957cc73ee19aeb5", size = 465974, upload-time = "2026-05-18T04:30:17.006Z" },
+    { url = "https://files.pythonhosted.org/packages/a7/6c/89b1a230a78f57c52dd8893adb1f92f94411721b6ec12596c56d98c74356/watchfiles-1.2.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ca148d73dea36c9763aaa351e4d7a51780ec1584217c45276f4fe8239c768b71", size = 454782, upload-time = "2026-05-18T04:30:35.656Z" },
+    { url = "https://files.pythonhosted.org/packages/24/62/1732118367cfff0a9fce3bf62ff4bfded09ef5df21d9d446b858b3f70a96/watchfiles-1.2.0-cp313-cp313-manylinux_2_31_riscv64.whl", hash = "sha256:c525543d91961c6955b2636b308569e84a1d1c5f5f2932041ab9ef46422f43e3", size = 465182, upload-time = "2026-05-18T04:30:20.846Z" },
+    { url = "https://files.pythonhosted.org/packages/28/96/716f7e5f51339bf22963f3345f9f27d7f3b30e2eadc597e257c881dd3c53/watchfiles-1.2.0-cp313-cp313-musllinux_1_1_aarch64.whl", hash = "sha256:a204794696ffb8f9b10fba6f7cb5216d42f3b2b71860ccac6b6e42f5f10973b0", size = 629841, upload-time = "2026-05-18T04:31:05.397Z" },
+    { url = "https://files.pythonhosted.org/packages/4c/fe/c40783950fd771ccf66ab3ec2722d188a9af1c7f96c6e811f36e40c6e03f/watchfiles-1.2.0-cp313-cp313-musllinux_1_1_x86_64.whl", hash = "sha256:10d86db20695afe7997ac9e1717637d6714a8d0220458c33f3d2061f54cec427", size = 658028, upload-time = "2026-05-18T04:31:48.22Z" },
+    { url = "https://files.pythonhosted.org/packages/71/72/4508db1856d1d87fcbb3b63f4839bab1b5682cb0e8d224d122263c09654a/watchfiles-1.2.0-cp313-cp313-win32.whl", hash = "sha256:eb283ee99e21ad6443c8cdb06ac5b34b1308c329cbdf03fa02b445363714c799", size = 275183, upload-time = "2026-05-18T04:30:59.57Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/36/14b76ca57652e5cc5fd1c11f32a261292c08a0d19a00351013c2549cbfb2/watchfiles-1.2.0-cp313-cp313-win_amd64.whl", hash = "sha256:a0f27f01bee51861392bb6b7c4fdb290b27d1eb194e9e28788d68102a0e898d9", size = 288059, upload-time = "2026-05-18T04:32:07.937Z" },
+    { url = "https://files.pythonhosted.org/packages/1b/8d/0a85e395398d8d20fadfe5c5d32c726eee17a519e78fb356f2cf7531bffe/watchfiles-1.2.0-cp313-cp313-win_arm64.whl", hash = "sha256:3651aa7058595e9cfb75d35dd5ada2bf9f48a5b8a0f3562821d3e210c507e077", size = 280186, upload-time = "2026-05-18T04:31:54.484Z" },
+    { url = "https://files.pythonhosted.org/packages/37/68/36db056f1fdcc5f07302f56e631774d6835bcd6fa3ace402304621d5f9e5/watchfiles-1.2.0-cp313-cp313t-macosx_10_12_x86_64.whl", hash = "sha256:faea288b6f0ab1902ef08f4ca6de005dccf856c4e0c4f21b8c5fce02d90a1b08", size = 399031, upload-time = "2026-05-18T04:30:44.576Z" },
+    { url = "https://files.pythonhosted.org/packages/c1/64/01a9d6f66a82a5c101ce939274106cc72759d62427e153f01edd2b9f87c2/watchfiles-1.2.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:01859b11fd9fbca670f4d5da00fbac282cfea9bd67a2125d8b2833a3b5617ea9", size = 391205, upload-time = "2026-05-18T04:30:25.413Z" },
+    { url = "https://files.pythonhosted.org/packages/84/2c/0a44fe058cb4bb7b8ede6b6670698bbb7c0400740e378d00022189b7b31d/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fff610d7bb2256a317bb1e96f0d7862c7aa8076733ee5df0fd41bbe76a24a4f4", size = 451892, upload-time = "2026-05-18T04:32:14.005Z" },
+    { url = "https://files.pythonhosted.org/packages/67/a1/351e0d56cd35e6488b5c8b4fb11a809a5bc923e8fe8fed9faf8920be0c89/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b141a4891c995a039cd89e9a49e62df1dc8a559a5d1a6e4c7106d16c12777a55", size = 458867, upload-time = "2026-05-18T04:31:22.279Z" },
+    { url = "https://files.pythonhosted.org/packages/d5/7d/9d09605187f1b838998624049fcf8bf47b73c1a3b76901fcac1782f62277/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f22943b7770483f6ea0721c6b11d022947a98eb0acae14694de034f4d0d38925", size = 490217, upload-time = "2026-05-18T04:31:43.657Z" },
+    { url = "https://files.pythonhosted.org/packages/60/5d/a17a16eccb182f04188cd308ec24b1a71a9b5c4e7098269cf35d9fa56d02/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:1bc6195825b7dcd217968bb1f801a60fd4c16e8eeab5bedc7fe917d7d5995ab4", size = 571458, upload-time = "2026-05-18T04:32:11.875Z" },
+    { url = "https://files.pythonhosted.org/packages/d3/3d/4dd457062083ab1938e5dfd45032eb425cee2ac817287ca8ff4356183e5d/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:d4a4b147f5dca2a5d325a06a832fb43f345751adfbc63204aec30e0d9ca965a2", size = 464707, upload-time = "2026-05-18T04:30:43.492Z" },
+    { url = "https://files.pythonhosted.org/packages/c6/71/ea8c57b128f5383de74d0c7d2d9c57ad7c9a65a930c451bd25d524b295b7/watchfiles-1.2.0-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4543579a9bdb0c9560039b4ffddbdb39545707659fbc430ce4c10f3f68d557f9", size = 454663, upload-time = "2026-05-18T04:30:16.061Z" },
+    { url = "https://files.pythonhosted.org/packages/53/fd/2e812bf938406d7db351f0703ddd3fc6c061cf30d96153a77bc79a943a44/watchfiles-1.2.0-cp313-cp313t-manylinux_2_31_riscv64.whl", hash = "sha256:20aa0e708b920bde876a4aa82dc7dd6ebea228a63a67cda6632c2fc87b787efa", size = 463537, upload-time = "2026-05-18T04:31:44.9Z" },
+    { url = "https://files.pythonhosted.org/packages/86/56/d17a7f1dd1bc3035f1072694a551301272f1739c2d8e319c927cb9e29b38/watchfiles-1.2.0-cp313-cp313t-musllinux_1_1_aarch64.whl", hash = "sha256:d413349d565dab74297f2a63e84a097936be69bf8f3b3801f27f380e32040f44", size = 629194, upload-time = "2026-05-18T04:31:14.141Z" },
+    { url = "https://files.pythonhosted.org/packages/be/06/f1ff66bf5cae50aa4062779a0ecd0bbaf15e466195719074078947d9a17d/watchfiles-1.2.0-cp313-cp313t-musllinux_1_1_x86_64.whl", hash = "sha256:f28b2725eb8cce327b9b3ab02415c853011dc55c95832fe90de6bc56f5315f72", size = 656194, upload-time = "2026-05-18T04:31:47.14Z" },
+    { url = "https://files.pythonhosted.org/packages/e7/54/a9c7ea9a82a4ac65e7004c0a03920b5cdd2f9c3b678757d9cd425aa51d53/watchfiles-1.2.0-cp314-cp314-macosx_10_12_x86_64.whl", hash = "sha256:b8c8358484d5fa12ef34f05b7f4168eaf1932f408725ff6d023c33ec17bd79d4", size = 400205, upload-time = "2026-05-18T04:32:05.153Z" },
+    { url = "https://files.pythonhosted.org/packages/aa/5d/c9ab3534374a4a67450696905d6ef16a04405448b8dc52bd752ae50423d4/watchfiles-1.2.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:9f04b092229ad2c50126dd3c922c8822e51e605993764a33058d4a791ab42281", size = 392508, upload-time = "2026-05-18T04:30:54.849Z" },
+    { url = "https://files.pythonhosted.org/packages/26/ca/1ad30103535cf0cecd7b993e8d50edc5351b1820e38f2d22e3df58962feb/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:7a7ce236284f002a156f70add88efe5c70879cccbb658be0822c54b1306fc09d", size = 452448, upload-time = "2026-05-18T04:30:53.727Z" },
+    { url = "https://files.pythonhosted.org/packages/37/a1/ceee2cdf2afbd715fa07758d39c9859513eae411b23196f7fd039e5feedd/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:b9909cc2b48468b575eefa944919e1fe8a36c5849d5c7c168f80a8c1db69398e", size = 459605, upload-time = "2026-05-18T04:30:23.312Z" },
+    { url = "https://files.pythonhosted.org/packages/e8/f6/421e30fd1cb3907a84ed92ab3f1983e37ba2dca015e9a894a048418417a2/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:0a37faaed405c67e28e6be45a1fa4f206ef5a2860f27c237db9fa30704c38242", size = 490757, upload-time = "2026-05-18T04:30:47.358Z" },
+    { url = "https://files.pythonhosted.org/packages/41/b0/55ed1b97ed08be7bba6f9a541cac15f2a858e1d74d2b07b6da70a82aab00/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:9649193aa27bd9ff2e80ff29bfaa93085496c7a3a377592823cc58b77ee88add", size = 568672, upload-time = "2026-05-18T04:30:38.915Z" },
+    { url = "https://files.pythonhosted.org/packages/d1/cf/d8ae8a80dd7bafab395ea7681c10237311bbf34d37704a8c744e7cf31fc7/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:4e4ff8e37f99cf1da89e255e07c9c4b37c214038c4283707bdec308cb1b0ea1f", size = 464197, upload-time = "2026-05-18T04:30:09.914Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/8a/3076c496ca8dafe0e8cd03fcebdfc47be4b1174b4e5b24ff6e396e6b3af2/watchfiles-1.2.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:054dc20fd2e3132b4c3883b4a00d72fd6e1f56fdaf89fccd12e8057d74cd74d7", size = 453181, upload-time = "2026-05-18T04:30:14.829Z" },
+    { url = "https://files.pythonhosted.org/packages/e5/10/9745e17c98e7b8a86454df0a3c7b5686bd650383f1e9f26e4ebcbd6cc0c0/watchfiles-1.2.0-cp314-cp314-manylinux_2_31_riscv64.whl", hash = "sha256:e140ed30ebde76796b686e67c182cff10ea2fbab186fafd1560f74bb5a473a6e", size = 465109, upload-time = "2026-05-18T04:30:28.123Z" },
+    { url = "https://files.pythonhosted.org/packages/8f/95/8ef4a95481d3e0cb52d62a06fa6e972e81424be2d9698b91a2fecca9904c/watchfiles-1.2.0-cp314-cp314-musllinux_1_1_aarch64.whl", hash = "sha256:bb7e52ecf68ba46d22df23467b87cffeb2146908aa523ebfe803019618cfda06", size = 630653, upload-time = "2026-05-18T04:31:49.304Z" },
+    { url = "https://files.pythonhosted.org/packages/fd/e4/3b3bf36b0f829b50c6ebcb8d031583863c59f923d6a6af3d485e470d0fac/watchfiles-1.2.0-cp314-cp314-musllinux_1_1_x86_64.whl", hash = "sha256:23282a321c8baf9b3a3c4afff673f9fe65eb7fdc2338d765ccad9d3d1916a5ba", size = 657838, upload-time = "2026-05-18T04:31:06.497Z" },
+    { url = "https://files.pythonhosted.org/packages/21/b1/6cbbb50c1f3002ab568777d44aa21206dfb8807a840990c4037523b51812/watchfiles-1.2.0-cp314-cp314-win32.whl", hash = "sha256:c0db965c5f79aa49fe672d297cf1febc5ad149b658594944f49a54a2b96270a7", size = 275108, upload-time = "2026-05-18T04:30:06.891Z" },
+    { url = "https://files.pythonhosted.org/packages/92/45/190ce6db8dcb4536682cf75d3889ff1a27182a58cb519d343cb6d9ea63d8/watchfiles-1.2.0-cp314-cp314-win_amd64.whl", hash = "sha256:71283b39fd17e5408eb123bd37aeecfd9d54c81fc184421943208aadb879d103", size = 288441, upload-time = "2026-05-18T04:32:12.901Z" },
+    { url = "https://files.pythonhosted.org/packages/74/0d/3eae1c2313ab08378431d907c3f8095ecca00f3eda33111cf4f0f2591799/watchfiles-1.2.0-cp314-cp314-win_arm64.whl", hash = "sha256:c5c19526f4e54a00f2666a6c0e9e40d582c09e865055ea7378bf0009aab857b3", size = 280684, upload-time = "2026-05-18T04:31:26.902Z" },
+    { url = "https://files.pythonhosted.org/packages/b1/75/fb64e6c25d6b5ca636d03df34ffb1c6e9873303e76d27967e045f8df088f/watchfiles-1.2.0-cp314-cp314t-macosx_10_12_x86_64.whl", hash = "sha256:d73a585accffa5ae39c17264c36ec3166d2fad7000c780f5ef83b2722afb9dd2", size = 398857, upload-time = "2026-05-18T04:32:17.108Z" },
+    { url = "https://files.pythonhosted.org/packages/73/4e/9f7adf01754cbf81843722ccfec169d8f26c69778281a302855cecd2ee08/watchfiles-1.2.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:ae99b14c5f21e026e0e9d96f40e07d8570ebee6cafd9d8fc318354606daa7a28", size = 392413, upload-time = "2026-05-18T04:31:07.911Z" },
+    { url = "https://files.pythonhosted.org/packages/47/c8/bec626bcc2d69f44b9acb24ce7d60ed7b16b73628eea747fcbd169d8edda/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4429f3b105524a10b72c3a819b091c495d2811d419c1e1e8df773a5a5974f831", size = 452409, upload-time = "2026-05-18T04:31:20.142Z" },
+    { url = "https://files.pythonhosted.org/packages/00/b7/b6362068e81e7c556d155a34c35d40ac3ef42d747b06d7f6e5bf58e359c2/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:43d818978d06062d9b22c4fab2ebe44cf5213d42dc8e62bda8c2760cfa2eeb33", size = 458827, upload-time = "2026-05-18T04:32:06.219Z" },
+    { url = "https://files.pythonhosted.org/packages/67/f8/9a813fa42afb1e0b4625e75f0479826644d3ee8dc287e093799bc01f390c/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:b9f732dc58b2dbe69e464ccf8fff7a03b0dd0be439da4c0720d3558527d3d6b4", size = 490104, upload-time = "2026-05-18T04:31:56.034Z" },
+    { url = "https://files.pythonhosted.org/packages/2f/bf/27dfb6094ca4c9aad21298b5525b6c53cb36121ee454331d05161e58d130/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8f200104103feb097de4cab8fe4f5dd18a2026934c7dea98c55a2f5fd6d5a33b", size = 571360, upload-time = "2026-05-18T04:31:57.133Z" },
+    { url = "https://files.pythonhosted.org/packages/fb/39/44a096d67270ea93df91d33877dbe91fbda3aa4f8ec2edf799d93eda8736/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:63ac26eefbf4af1741247d6fb68b11c49a25b2f7413fbd318a83a12aaa9cf666", size = 464644, upload-time = "2026-05-18T04:30:57.33Z" },
+    { url = "https://files.pythonhosted.org/packages/0e/80/c7472203bad6268e3ef1ad260739704847898938ad7ea8b63a5131f46b50/watchfiles-1.2.0-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0c4997d4e4a55f0d02b6cde327322daf3a0400e5df6c6b15948994bf72497925", size = 454771, upload-time = "2026-05-18T04:30:48.736Z" },
+    { url = "https://files.pythonhosted.org/packages/51/cf/3b10b268b4b7f0fc26e9debb5eef1998b515887840f444cd3ec80c688755/watchfiles-1.2.0-cp314-cp314t-manylinux_2_31_riscv64.whl", hash = "sha256:4c887eba18b7945ac73067a8b4a66f21cd46c2539b2bc68588f7be6c7eb6d26b", size = 463494, upload-time = "2026-05-18T04:31:33.826Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/3e/a4302545cd589262a0dc7d140e86f7688eba3f9c72776c27f7e23b8864c4/watchfiles-1.2.0-cp314-cp314t-musllinux_1_1_aarch64.whl", hash = "sha256:3416ff151bb6b5a8d8d11664974fbef4d9305b9b2957839ab5a270468fd8df30", size = 629383, upload-time = "2026-05-18T04:31:15.596Z" },
+    { url = "https://files.pythonhosted.org/packages/db/99/d5649df0a9a410d45b7c882304d0b790903ac9b6e8f2cfd12114e0c6b9f2/watchfiles-1.2.0-cp314-cp314t-musllinux_1_1_x86_64.whl", hash = "sha256:0e831a271c035d89789cffc386b6aa1375f39f1cd25eb7ca0997e4970d152fc5", size = 656093, upload-time = "2026-05-18T04:31:58.707Z" },
+    { url = "https://files.pythonhosted.org/packages/92/b9/362702539275019a54dd2e94511b31a9b89c5f9e6a21966de7eb692549fc/watchfiles-1.2.0-cp315-cp315-macosx_10_12_x86_64.whl", hash = "sha256:37a6721cdf3f65dbb13aa9503510ccb4451603ac837e44d265d7992a597e1374", size = 400109, upload-time = "2026-05-18T04:31:16.879Z" },
+    { url = "https://files.pythonhosted.org/packages/8f/75/71d5ba62db781e5587bded1d944c675374bc4aa37ff33d5018d98e8b6538/watchfiles-1.2.0-cp315-cp315-macosx_11_0_arm64.whl", hash = "sha256:2b37d10b5a63bd4d87e18472d80fa525bd670586fae62e5dd580452764879b65", size = 392167, upload-time = "2026-05-18T04:31:28.058Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/01/c66dd95d0423fe30d31820e2d1d5bda773764131bbb6ac0cb1cf303ac328/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:0a105bc2283f67e8fbec74253ec2d94925de92ed72c0393f1206bf326b7b7b69", size = 452372, upload-time = "2026-05-18T04:31:00.836Z" },
+    { url = "https://files.pythonhosted.org/packages/91/15/2fe99557e72f85627c6a8eed50d889e8d101623e060a22ad75b875cb932d/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_armv7l.manylinux2014_armv7l.whl", hash = "sha256:5327989a465505f05cfe06f04fa9d0c2fd5432bb243e10e6f012b1bdca3c8579", size = 459596, upload-time = "2026-05-18T04:31:34.96Z" },
+    { url = "https://files.pythonhosted.org/packages/ed/23/d4acfa0023367428ed48351b3b9b267893037b6cadae55620c61c24bcfd4/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:ecb47f183a8025b2aa18b546725c3657e542112ae9c0613a2af79b4fa8d04ad7", size = 490869, upload-time = "2026-05-18T04:31:59.923Z" },
+    { url = "https://files.pythonhosted.org/packages/a4/5f/3164cbdce06c9fb95c4f7b9e2f9760b5e2797af43a9ecc317ef42a23a278/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:8520a4ab0e37f770afc34459c4f8f7019e153f9124dc101c15538365875d1ab2", size = 571641, upload-time = "2026-05-18T04:32:00.948Z" },
+    { url = "https://files.pythonhosted.org/packages/41/e6/85d3731c55e65cd7690f3f803d24c139588aaf863e4bf2148fe7a7fa1a19/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_s390x.manylinux2014_s390x.whl", hash = "sha256:71cd71740ed2c15211ebb237ced4e39a1cdf6f80566e5fe95428da1626f4fde6", size = 464444, upload-time = "2026-05-18T04:30:34.298Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/7d/562641012b8b09872742c3b8adf9629ec479fd78f8d68ae4a0c13da8add6/watchfiles-1.2.0-cp315-cp315-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:f88af53d6ddaf72179ef613ddc905e6f4785f712b49b80b3bef9f3525e6194b4", size = 453593, upload-time = "2026-05-18T04:31:23.464Z" },
+    { url = "https://files.pythonhosted.org/packages/56/fe/cb8ef3d6f929d14158fdaaad9925985b7310abc9384dcd4d82dd0016fb59/watchfiles-1.2.0-cp315-cp315-manylinux_2_31_riscv64.whl", hash = "sha256:cee9d5efd929efdac5f7e58f72b3376f676b64050a91c5b99a7094c5b2317488", size = 465096, upload-time = "2026-05-18T04:31:30.384Z" },
+    { url = "https://files.pythonhosted.org/packages/25/91/80908e835e100527a9267147b08c0eee1fa6ab0ffec15edc04d1d44885f7/watchfiles-1.2.0-cp315-cp315-musllinux_1_1_aarch64.whl", hash = "sha256:b718bf356bbc15e559bd8ef41782b573b8ae0e3f177ab244b440568d7ea02cfb", size = 630638, upload-time = "2026-05-18T04:30:49.89Z" },
+    { url = "https://files.pythonhosted.org/packages/46/4b/95ab2f256bb4af3cb2eb23b9317bda984ee6e0f11733a5c004a6c95b06e3/watchfiles-1.2.0-cp315-cp315-musllinux_1_1_x86_64.whl", hash = "sha256:922c0e019fe68b3ae392965a766b02a71ba1168c932cebc3733cd52c5fe5b377", size = 657684, upload-time = "2026-05-18T04:31:32.027Z" },
+]
 [[package]]
 name = "word2number"
 version = "1.1"