# Adversarial Structured-Extraction Arena: train an extractor while an adversary attacks your OCR and schema

*OpenEnv India Hackathon 2026 · Theme: multi-agent / robust structured extraction*

---

## Why this problem is hard

Real documents are **messy**: OCR confuses characters (`0`/`O`, `1`/`l`), field names **drift**, and noise can break pipelines that assume clean text and a fixed schema. Static benchmarks rarely model an **active opponent** that perturbs inputs under **rules and a budget**.

We built a **trainable two-agent arena** on **OpenEnv**: one policy **extracts** JSON under a target schema; another **adversary** proposes **executable edits** to the document and schema so the extractor must stay robust.

---

## What the environment does

**Adversarial Structured-Extraction Arena** is declared in [`openenv.yaml`](https://github.com/Hardikjha09/openenv-adversarial-extraction-arena/blob/main/openenv.yaml) and implemented in code as:

- **`AdversarialExtractionEnv`** (`env/extraction_env.py`) — stepping, observations, and rubric-based rewards.
- **Adversary executor** (`env/adversary.py`) — applies structured edits such as `rename_field`, `ocr_noise`, `swap_type`, `inject_distractor`, and more, within a **token budget**.
- **Rubric** (`env/rubric.py`) + **grader** (`grader/`) — scoring is **not** “exact string match only”; gates include valid JSON and schema coverage, with **fuzzy / typed** alignment to gold answers and bonuses for drift awareness.

The stack uses **`openenv-core`** and standard Python tooling; training uses **TRL** and **Unsloth** on **Qwen2.5-1.5B-Instruct** with **LoRA** adapters published on the Hub.

---

## Agents: Extractor (E) and Adversary (A)

| Agent | Role | Output |
|--------|------|--------|
| **E (Extractor)** | Read the (possibly perturbed) document and schema | `ExtractorAction` with **`extracted_json`** as a JSON object (training uses markdown-fenced JSON) |
| **A (Adversary)** | Spend budget to stress the extractor | `AdversaryAction` with a **list of edits** the environment applies in order |

Eval and the Space demo run **paired inference**: the adversary proposes edits, the environment applies them, then the extractor sees the stressed document/schema.

---

## How we trained (real runs, reproducible)

1. **Corpus** — Synthetic **Indian-context** documents and schemas from `data/generator.py` → `data/corpus.json` (generated in Colab or locally; not committed to git due to size).

2. **Extractor SFT** — `training/sft_warmup.py` (TRL `SFTTrainer` + Unsloth). Saves a LoRA adapter (e.g. `checkpoints/sft_warmup`) and **`trainer_log_history.json`**.

3. **Adversary SFT** — `training/sft_adversary.py` teaches the model to emit **valid edit JSON** matching the executor. Supervision uses **heuristic** edit programs sampled per document (same edit types as production). Default slice avoids overlapping the extractor’s first training slice (`--start_idx 200`, `--n_docs 200`).

4. **Optional RL** — `training/grpo_trainer.py` can refine the **extractor** further with GRPO; the main submission path is **SFT + eval**.

Training is packaged for judges in a **single Colab notebook** that clones the repo, installs dependencies, generates the corpus, runs both SFT jobs, and refreshes loss plots.

---

## Evaluation: proof beyond the demo

`evaluation/run_eval.py` runs many **holdout episodes** with optional **`--adversary_model_path`**, tracks extractor/adversary rewards, maintains **Elo**-style ratings, and writes **`eval_metrics.json`**. Plotting scripts under `plots/` turn logs into **loss** and **eval** figures.

This is the right place to quote **aggregate** behavior; the Gradio Space is for **interactive** intuition (and can be run with **GPU** so Hub LoRAs load in 4-bit).

---

## Evidence (training + evaluation artifacts)

All links are on the **Hugging Face model repos** so judges can verify without retraining:

**Extractor — [HardikJha/extractor-aea](https://huggingface.co/HardikJha/extractor-aea)**

- [Training loss](https://huggingface.co/HardikJha/extractor-aea/blob/main/plots/sft_loss.png)
- [Eval reward (moving average)](https://huggingface.co/HardikJha/extractor-aea/blob/main/plots/rewards.png)
- [Eval Elo](https://huggingface.co/HardikJha/extractor-aea/blob/main/plots/elo_ratings.png)
- [Eval metrics JSON](https://huggingface.co/HardikJha/extractor-aea/blob/main/eval_metrics.json)
- [SFT trainer log (raw)](https://huggingface.co/HardikJha/extractor-aea/blob/main/trainer_log_history.json)

**Adversary — [HardikJha/adversary-aea](https://huggingface.co/HardikJha/adversary-aea)**

- [Adversary SFT loss](https://huggingface.co/HardikJha/adversary-aea/blob/main/plots/sft_adversary_loss.png)
- [SFT trainer log](https://huggingface.co/HardikJha/adversary-aea/blob/main/trainer_log_history.json)

---

## Try it and reproduce

| Resource | URL |
|----------|-----|
| **Runnable Space (Gradio)** | [https://huggingface.co/spaces/HardikJha/extraction-arena](https://huggingface.co/spaces/HardikJha/extraction-arena) |
| **Training Colab** | [Open in Colab](https://colab.research.google.com/github/Hardikjha09/openenv-adversarial-extraction-arena/blob/main/notebooks/Train_Extractor_Colab.ipynb) |
| **Source code** | [GitHub: openenv-adversarial-extraction-arena](https://github.com/Hardikjha09/openenv-adversarial-extraction-arena) |

**Space tip:** enable **GPU** in Space settings for live **extractor-aea** / **adversary-aea** inference; on CPU the UI still demonstrates perturbations with manual / fallback paths.

---

## Honest limitations (what we are not claiming)

- The adversary’s SFT targets are **synthetic heuristics**, not human red-teaming or full multi-agent RL equilibrium.
- Under **extreme** OCR-style noise, **numeric** and **long ID** fields can still fail or drift; the rubric and qualitative demo both matter.
- **`TokenBudgetPenalty`** in the rubric is a shaping term; see `env/rubric.py` for exact weights and gates.

---

## Summary

We contribute an **OpenEnv-grounded**, **two-policy** extraction arena with **public LoRAs**, **logged training**, **eval curves**, and a **discoverable Hugging Face Space**, so judges can **re-run training in Colab** and **inspect real plots** on the Hub. The project is built **on** OpenEnv and TRL/Unsloth—not a one-off custom RL stack—so the community can extend adversaries, rubrics, and policies in one place.