# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

A Hugging Face Spaces app that translates between 418 languages from Table 9 (Section A.1) of Google's [MADLAD-400](https://arxiv.org/pdf/2309.04662) 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no CUDA GPU is available.

## Commands

```bash
# Setup
uv venv --python 3.12
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt

# Run (launches on http://localhost:7860)
uv run app.py

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check
uv run ty check

# Test
uv run pytest                     # all 81 tests (slow require CUDA + model download)
uv run pytest -m "not slow"       # 71 fast tests only
uv run pytest -m slow             # 10 model tests only (CUDA only)

# Generate language mapping (dev only)
uv run scripts/generate_langmap.py <path-to-paper.pdf>
```

## Architecture

**`app.py`** — Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an `info=` caption discloses. Each control carries an `info=` caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` — fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation; whitespace-only or `None` input short-circuits to an empty string before the model loads. The generation params are normalized in `translate()` via `_normalize_params` (`None`/`NaN` → default, then clamped to range) so the cast-less public path and the ZeroGPU duration callable can't crash on a cleared `gr.Number` field; it also caps the `max_new_tokens × num_beams` product (`_MAX_TOKEN_BEAM_PRODUCT = 720`, trimming the token count) so a request can't outlive the GPU time `_estimate_duration` reserves. Decoding is greedy by default (deterministic); a non-default `temperature` (tolerance-compared to absorb float spinner drift) enables sampling, and `num_beams > 1` uses beam search. A collapsed "Advanced" accordion exposes `max_new_tokens`/`num_beams`/`temperature` as `gr.Number` controls (no sliders; defaults mirror `translate()`, so the default surface stays greedy). Right-to-left target scripts (an explicit `RTL_CODES` token set — `region` is not a usable proxy) flip the output box to RTL via the Translate-button and swap paths; Ctrl+Enter/`/translate` return a bare string and stay LTR. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens × num_beams` (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the `/translate` API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable `/translate` API endpoint (returns a bare string); the swap and Translate-button handlers are `api_visibility="private"`, and both generation handlers use `show_progress="minimal"`. Only `/translate` is public.

**`langmap/`** — Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.

**`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.

**`tests/`** — 81 tests (71 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 61 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding with near-1.0 temperature tolerance, param forwarding into `generate` and the token×beam cap applied through `translate()`, `_normalize_params` None/NaN/clamp coercion and product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.

## Tooling

When working with Python, invoke the relevant `/astral:<skill>` for uv, ty, and ruff to ensure best practices are followed.

- **uv** — Python package manager. Used for venv creation and dependency installation. No `pyproject.toml` (HF Spaces requires `requirements.txt`). `requirements.txt` is the Spaces build manifest and omits `gradio`/`spaces` (provided by the Spaces runtime on every tier) and pins `torch` to a ZeroGPU-supported version; `requirements-dev.txt` adds `gradio`/`spaces` for local runs plus the dev tooling, so local setup installs both files.
- **Ruff** — linter and formatter (`ruff.toml`). Rules: `E`, `F`, `I`, `UP`, `W`. Line length: 120.
- **ty** — type checker (`ty.toml`). Python 3.12 target.
- **pytest** — test runner (`pytest.ini`). Custom `slow` marker for CUDA-dependent tests.