# Changelog

All notable changes to this project are documented here. The format is based
on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project
adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.4] - 2026-06-30

### Fixed

- **"Admitted [redacted]" in the discharge summary.** The de-id **shifts** every date
  (DOB and visit dates) rather than tokenising it, so no `[DATE_n]` token exists — but
  the 1.2.2 prompt told the model to reproduce a date *token*. Seeing dates but no token,
  the model hallucinated a `[DATE_1]`/`[DATE_X]`, which the `redact_unresolved()` safety
  net then turned into `[redacted]`.
  The date shift is already **reversible** (`forward: 13/02/26 → 22/03/26`, reverse
  restores it), so the fix is prompt-only: the model is now told to copy the
  admission/visit date **exactly as written** in the note (a plain date, not a token);
  it sees only the shifted date, and `reidentify_out` restores the **true** admission
  date for the clinician (e.g. "Admitted on 13/02/26 …"). Not a single line of
  `src/deid.py` changed; a new test guards the shift round-trip.

### Changed

- **The patient is no longer named in the discharge summary.** The card previously
  rendered a `<name> — discharge summary` title (resolved from `person_id`), which both
  duplicated the static "Discharge Summary" header and put the patient's real name on the
  output. The patient name / `{{PATIENT}}` placeholder is removed entirely:
  - **Prompt** — the output format drops the title line (now three elements: narrative,
    follow-up, grounded); the model is told never to name the patient and to refer to
    them as "the patient" throughout. Other people's surrogate tokens still restore.
  - **`reidentify_out`** — no longer substitutes a patient name; `State.person_name` and
    the `app/api.py` `person_id → name` lookup (`_PATIENT_NAMES`) are removed (the
    `/process` `person_id` field is kept, accepted-but-unused, for UI compatibility).
  - **UI** — `renderSummary()` drops any stray title line; the card's "Discharge Summary"
    header is the only title.

## [1.2.2] - 2026-06-30

### Fixed

- **Literal `[DATE_X]` reaching the discharge summary.** Two compounding causes: the
  SYSTEM prompt's format example literally contained `Admitted [DATE_X] after …`, which
  the model copied verbatim when the de-identified note had no admission-date token (the
  date is only in the dataset's structured tables, never in the text the model sees); and
  `reidentify_out` only recognised `[LABEL_<digits>]`, so `[DATE_X]` was never flagged or
  redacted. Fixes:
  - **Prompt** — the date example is now the angle-bracket fill-in `<admission date>`,
    with an explicit rule to reproduce a `[DATE_n]` token only if present and otherwise
    omit the date — never emit a literal placeholder.
  - **Defense-in-depth** — new `NoteGuard.redact_unresolved()` (pattern `_ORPHAN_PAT`,
    `[LABEL_<anything>]`) redacts and flags any surrogate-shaped token still present after
    re-identification, including stray template placeholders like `[DATE_X]`.
    `reidentify_out` now routes through it, so none can reach the clinician verbatim.

## [1.2.1] - 2026-06-30

### Fixed

- **NER over-redaction of clinical terms** — the small spaCy model (`en_core_web_md`)
  shipped in 1.2.0 mislabels abbreviations like "Subcut" as a `PERSON`, and Presidio
  scores every spaCy entity a flat `0.85`, so a confidence threshold cannot separate
  them from real names. Added a clinical-term allow-list (`_NER_STOPWORDS`) applied at
  the detector boundary (`NoteGuard._detect_names`), so those terms are never redacted
  as names while real names alongside them still are. Used in both `deidentify()` and
  `residual_identifiers()` so the eval leak-check is consistent too.

## [1.2.0] - 2026-06-29

Follow-up to 1.1.0: actually **raise de-identification recall** so the residual-PII
audit has less to catch. The 1.1.0 panel honestly reported leaked free-text names
(e.g. "Dr Ethel Joanne Duffy") but nothing redacted them in the deployed image,
because Presidio/spaCy NER was wired behind the `_Detector` interface yet never
shipped. This turns it on.

### Added

- **Presidio + spaCy NER in the deployed image** — `Dockerfile` installs the `[nlp]`
  extra (`presidio-analyzer`, `spacy`) and `en_core_web_md`, so free-text PERSON/
  LOCATION names with no vault entry are now tokenised before the model sees them.
- **`[nlp]` optional dependency** in `pyproject.toml` (`pip install -e ".[nlp]"`).
- **`NOTEGUARD_SPACY_MODEL`** env var (default `en_core_web_md`) — switch to
  `en_core_web_lg` for higher recall at ~14× the model size, or `en_core_web_sm`
  for a smaller image.

### Changed

- `src/deid.py` `_build_detector()` now configures the spaCy model explicitly via
  `NlpEngineProvider` (so it no longer hard-defaults to the unshipped `en_core_web_lg`)
  and still degrades gracefully to a no-op stub when Presidio is absent (CI, local).
- Tests pin the no-op detector by default (autouse fixture) so the suite is
  deterministic whether or not the `[nlp]` extra is installed; a new test covers the
  NER redaction path.

### Notes

- NER is a recall **boost**, not a guarantee: `en_core_web_md` misses some names
  (recall is also lower for non-English names — see `docs/tool_card.md`). `scan_pii`
  remains the high-precision backstop that surfaces anything still leaking, and
  `assert_clean()` is unchanged (vault + structured patterns).

## [1.1.0] - 2026-06-29

The trust panel now reports **only** whether reversible pseudonymisation was done
correctly. The previous panel could read `RE-ID RISK 0.0%` while un-redacted free-text
names (e.g. "Dr Ethel Joanne Duffy") reached the model, because the risk number was
computed from the *vault* — and arbitrary pasted names are never in the vault.

### Added

- **`NoteGuard.scan_pii(text)`** (`src/deid.py`) — a vault-independent residual-PII
  audit of de-identified text. Flags structured identifiers that survived (NHS, GMC,
  NMC, email, phone, postcode) and free-text person names via a high-precision
  person-title heuristic (`Dr/Nurse/Consultant/…` + ≥2 Title-Case tokens). Surrogate
  tokens and bare role words are never flagged. Works on notes with no ground-truth
  vault, which is exactly where the old metric was blind.
- **Trust-panel metrics focused on de-id correctness** — `/process` now returns
  `deid_ok` (PASS/FAIL verdict), `residual_pii` (`[{type, text}]` the model still saw),
  `residual_pii_count`, and `reversible`. The UI cards become **De-identification**,
  **Identifiers replaced**, **Residual PII · model input** (with the offending snippets
  listed), and **Reversible**.

### Removed

- **Faithfulness and Grounded-sources metrics** from the trust panel — answer-quality
  signals, not de-identification correctness. Dropped the in-graph faithfulness judge
  (`model.invoke`) and Tavily-source extraction from `compute_trust`, and the
  `faithfulness_score` / `sources` state fields. (Tavily still grounds the answer; the
  `eval/run_eval.py` faithfulness evaluator is unchanged — it is a separate experiment.)
- **`metrics.residual_risk`** from `/process` — replaced by the honest `deid_ok` +
  `residual_pii_count`. `/summarise` keeps `residual_risk`/`ok`, now derived from the
  audit (`residual_pii` + `leaked_tokens`).

## [1.0.1] - 2026-06-28

### Changed

- **Default model downgraded to `gemini-2.0-flash`** — `NOTEGUARD_MODEL` default in
  `agent/graph.py` and `eval/run_eval.py` changed from `gemini-2.5-flash` to
  `gemini-2.0-flash`; `.env.example` updated to match.

### Added

- **HF Space auto-deploy** (`.github/workflows/deploy-hf.yml`) — every push to `main`
  mirrors the repo onto the HF Space `chaeyoona/noteguard-agent` as an orphan commit,
  triggering a Docker rebuild on port 7860. Needs the `HF_TOKEN` repo secret. Orphan
  strategy avoids the historical `docs/init.png` blob that HF's binary-in-history
  check rejects.

### Removed

- `docs/CHANGELOG.md` — duplicate of the root `CHANGELOG.md`.
- `docs/plan.md` — historical planning doc; no longer relevant post-1.0.
- `outputs/.gitkeep` — unused placeholder directory.

### Changed (CI)

- GitHub Actions Python matrix trimmed to **3.10 + 3.12** (intermediate versions
  removed).

---

## [1.0.0] - 2026-06-27

First post-hackathon release. The codebase is pruned to exactly the components
that ship in the deployed Hugging Face Space; sponsor-only integrations that never
ran in the deployed image are removed.

### Removed

- **Superlinked retrieval** (`src/retrieve.py`, `[retrieval]` optional dependency,
  and the `retrieve_context` graph node) — the in-memory vector index was excluded
  from the Docker image and lazy-imported behind a graceful fallback, so it never
  executed in the deployed app. Removing it deletes the `retrieved_context` state
  field and the per-request demo index seeding.
- **n8n workflow** (`workflows/noteguard.n8n.json`) — a three-node proxy
  (Webhook → HTTP Request → Respond) to NoteGuard's own REST API. It added no logic
  and sat off the runtime path; any automation platform can still call the REST API.

### Changed

- **Faithfulness judge now scores against the de-identified source note** (`deid_text`)
  instead of Superlinked-retrieved context. The score was previously gated on retrieval,
  so it never populated in the deployed app (which had no retrieval); it now produces a
  live number for every request and matches the definition used by `eval/run_eval.py`.
- `agent/graph.py`: pipeline simplified to
  `deidentify_in → agent → reidentify_out → compute_trust`; `build_graph()` no longer
  takes a `note_index` argument.
- `app/api.py`: `/process` reports `faithfulness` whenever a de-identified note exists;
  FastAPI app version bumped to `1.0.0`.
- `Dockerfile`: dependency comment updated — the image no longer "omits" Superlinked,
  it is simply not a dependency.
- `Makefile`: modernised to the real toolchain — installs via `pip install -e .`
  (no `requirements.txt`), lints/formats with `ruff` (not `black`), uses the `src`
  package and `src/fetch_dataset.py`.
- `pyproject.toml`: version bumped to `1.0.0`; `superlinked` removed from keywords;
  `[retrieval]` optional-dependency group dropped.

---

## [0.2.0] - 2026-06-27

### Added

- **Clinician web UI** (`app/static/index.html`) — single-file, vanilla JS, no build
  step. NHS dark-blue header, segmented toggle (Clinician view / What the AI sees),
  PHI highlighted in red in the clinician view, `[TYPE_N]` monospace chips in the AI
  view, discharge summary pane, trust-panel metric cards.
- **`POST /process` endpoint** (`app/api.py`) — returns `clinician_note`, `ai_note`,
  `identifiers` (original strings for highlighting), `discharge_summary`, and
  `metrics` (`identifiers_removed`, `residual_risk`, `grounded_sources`,
  `faithfulness`).
- **`GET /`** — FastAPI serves `app/static/index.html` directly; no separate static
  server needed.
- **`StaticFiles` mount** at `/static` — allows future CSS/JS assets alongside the
  single-page UI.
- **n8n workflow** (`workflows/noteguard.n8n.json`) — importable three-node workflow
  (Webhook → HTTP Request → Respond to Webhook) that routes ward notes through the
  NoteGuard API without the model ever seeing PHI.

- **Vercel deployment** (`api/index.py`, `api/requirements.txt`, `vercel.json`) —
  the FastAPI app is deployable as a serverless Vercel function. Light dep set
  omits superlinked/torch to stay under the 250 MB bundle limit; retrieval falls
  back gracefully to Gemini-only mode.

### Changed

- `app/api.py` now also serves the clinician web UI in addition to the REST API.
- `agent/graph.py`: `NoteIndex` import made lazy (inside try block) so the module
  loads cleanly in environments where superlinked is unavailable (CI, Vercel).
- `noteguard/__init__.py`: removed `NoteIndex` re-export; only `NoteGuard`,
  `DeidResult`, and `load_known_from_csv` are exported from the package.
- `Makefile` `run` target now starts uvicorn (`uvicorn app.api:app --reload --port 8000`)
  instead of Streamlit.
- `pyproject.toml` version bumped to 0.2.0; `per-file-ignores` added for E402
  (intentional `load_dotenv()` before API-key-consuming imports).

### Removed

- **`app/trust_panel.py`** — Streamlit demo UI retired; superseded by the
  single-file clinician web UI (`app/static/index.html`) served by FastAPI.
- **`streamlit`** removed from `requirements.txt`.

---

## [0.1.0] - 2026-06-27

### Added

- **De-identification core** (`noteguard/deid.py`) — dependency-free NHS-aware
  pipeline: NHS number, GMC/NMC, postcode, DOB, email and phone recognisers;
  vault-from-CSV for ground-truth measurement; consistent surrogates;
  `assert_clean()` hard guarantee; `reidentify()` for clinician-only restoration.
- **Superlinked retrieval node** (`noteguard/retrieve.py`) — in-memory vector
  index (`sentence-transformers/all-MiniLM-L6-v2`) with `assert_clean()` called
  on every document in and every retrieved chunk out.
- **LangGraph agent** (`agent/graph.py`) — full pipeline:
  `deidentify_in → retrieve_context → agent (Gemini + Tavily) → reidentify_out → compute_trust`.
  Trust metrics surfaced in graph state: identifiers removed, residual leakage,
  faithfulness score, source URLs.
- **Streamlit trust panel** (`app/trust_panel.py`) — three-way toggle
  (raw / de-identified / clinician answer) and live trust panel; styled to the
  NHS England identity.
- **LangSmith evaluations** (`eval/run_eval.py`) — `zero_phi_to_model` (must
  score 1.0) and `faithfulness` (LLM-as-judge over de-identified text only).
- Gold RAP packaging: `pyproject.toml`, `Makefile`, `CHANGELOG.md`,
  `CONTRIBUTING.md`, `.pre-commit-config.yaml`, `.editorconfig`, CI workflow,
  and `docs/` (architecture, user guide, RAP compliance).