# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

A Hugging Face Spaces app that translates English text to 22 production-ready languages (BLEU 35+ both directions, Section A.10) from Google's [MADLAD-400](https://arxiv.org/pdf/2309.04662) 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no CUDA GPU is available.

## Commands

```bash
# Setup
uv venv --python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

# Run (launches on http://localhost:7860)
python app.py

# Lint and format
ruff check .
ruff format .

# Type check
ty check

# Test
pytest                     # all tests (slow tests require CUDA + model download)
pytest -m "not slow"       # fast tests only
pytest -m slow             # model tests only (CUDA only)
```

## Architecture

**`app.py`** — Single-file application with a Google Translate-style two-column layout: left column has a static "English" dropdown and input textbox with inline clear button; right column has a searchable target language dropdown (with locale codes, e.g., "French (fr)") and output textbox with copy button. The Translate button spans full width below both columns (shows "Translating..." during processing). Ctrl+Enter submits from the input. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model (no download on import). Uses `float16` on CUDA, `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure.

**`langmap/`** — Package with `langid_mapping.py`, a hand-maintained dictionary mapping 22 Tier 1 production-ready language tokens (BLEU 35+ both directions, Section A.10) to human-readable language names. Available languages at runtime are the intersection of this mapping and the model's vocabulary.

**`tests/`** — 43 tests (33 fast, 10 slow). `test_langmap.py` has 8 fast tests for language mapping validation. `test_app.py` has 25 fast tests (signatures, device fallback, UI layout including English dropdown, textbox height, inline clear, translate button outside columns, no HTML elements, filterable dropdown, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.

## Tooling

- **uv** — Python package manager. Used for venv creation and dependency installation from `requirements.txt`. No `pyproject.toml`; `requirements.txt` remains the single source of truth (required by HF Spaces).
- **Ruff** — linter and formatter (`ruff.toml`). Rules: `E`, `F`, `I`, `W`. Line length: 120.
- **ty** — type checker (`ty.toml`). Python 3.12 target.
- **pytest** — test runner (`pytest.ini`). Custom `slow` marker for CUDA-dependent tests.