# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview A Hugging Face Spaces app that translates English text to 183 evaluated languages (Table 14, Section A.9) from Google's [MADLAD-400](https://arxiv.org/pdf/2309.04662) 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no CUDA GPU is available. ## Commands ```bash # Setup python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Run (launches on http://localhost:7860) python app.py # Lint and format ruff check . ruff format . # Type check ty check # Test pytest # all tests (slow tests require CUDA + model download) pytest -m "not slow" # fast tests only pytest -m slow # model tests only (CUDA only) ``` ## Architecture **`app.py`** — Single-file application. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model (no download on import). Uses `float16` on CUDA, `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure. **`langmap/`** — Package with `langid_mapping.py`, a hand-maintained dictionary mapping 183 evaluated language tokens to human-readable language names (sourced from Table 14, Section A.9 of the MADLAD-400 paper; evaluation sets: WMT, Flores-200, NTREX, Gatones). Available languages at runtime are the intersection of this mapping and the model's vocabulary. **`tests/`** — Pytest suite split into fast (`test_langmap.py`) and slow (`test_app.py`). Slow tests require CUDA and model download; they are auto-skipped without CUDA. Fast tests in `test_app.py` verify the module imports without triggering model download. ## Tooling - **Ruff** — linter and formatter (`ruff.toml`). Rules: `E`, `F`, `I`, `W`. Line length: 120. - **ty** — type checker (`ty.toml`). Python 3.12 target. - **pytest** — test runner (`pytest.ini`). Custom `slow` marker for CUDA-dependent tests.