Daryl Lim
fix: harden generation params, fix swap RTL, polish from review
e45a74c
|
Raw
History Blame
6.94 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

A Hugging Face Spaces app that translates between 418 languages from Table 9 (Section A.1) of Google's MADLAD-400 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no CUDA GPU is available.

Commands

# Setup
uv venv --python 3.12
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt

# Run (launches on http://localhost:7860)
uv run app.py

# Lint and format
uv run ruff check .
uv run ruff format .

# Type check
uv run ty check

# Test
uv run pytest                     # all 78 tests (slow require CUDA + model download)
uv run pytest -m "not slow"       # 68 fast tests only
uv run pytest -m slow             # 10 model tests only (CUDA only)

# Generate language mapping (dev only)
uv run scripts/generate_langmap.py <path-to-paper.pdf>

Architecture

app.py β€” Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an info= caption discloses. Each control carries an info= caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses @lru_cache for lazy loading of the google/madlad400-3b-mt tokenizer and model. On ZeroGPU (SPACES_ZERO_GPU=1), _maybe_eager_load() places the model at module scope so the spaces hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses bfloat16 on CUDA (T5/MADLAD is numerically unstable in float16 β€” fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), float32 on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., <2fr> Hello) before tokenization and generation; whitespace-only or None input short-circuits to an empty string before the model loads. The generation params are normalized in translate() via _normalize_params (None/NaN β†’ default, then clamped to range) so the cast-less public path and the ZeroGPU duration callable can't crash on a cleared gr.Number field. Decoding is greedy by default (deterministic); a non-default temperature (tolerance-compared to absorb float spinner drift) enables sampling, and num_beams > 1 uses beam search. A collapsed "Advanced" accordion exposes max_new_tokens/num_beams/temperature as gr.Number controls (no sliders; defaults mirror translate(), so the default surface stays greedy). Right-to-left target scripts (an explicit RTL_CODES token set β€” region is not a usable proxy) flip the output box to RTL via the Translate-button and swap paths; Ctrl+Enter//translate return a bare string and stay LTR. The @spaces.GPU decorator allocates GPU on HF Spaces infrastructure; its duration is a callable (_estimate_duration) that scales the GPU reservation with max_new_tokens Γ— num_beams (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the /translate API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable /translate API endpoint (returns a bare string); the swap and Translate-button handlers are api_visibility="private", and both generation handlers use show_progress="minimal". Only /translate is public.

langmap/ β€” Package with langid_mapping.py, mapping 418 language tokens to {"name": ..., "region": ...} dicts. Auto-generated by scripts/generate_langmap.py from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.

scripts/ β€” generate_langmap.py parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires requirements-dev.txt dependencies.

tests/ β€” 78 tests (68 fast, 10 slow). test_langmap.py has 10 fast tests for mapping validation (dict shape, regions, spot-checks). test_app.py has 58 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + None-safety, greedy-by-default decoding, param forwarding into generate, _normalize_params None/NaN/clamp coercion, empty/None-input short-circuit, RTL output direction on the button and swap paths, RTL_CODES βŠ† langmap invariant, requirements.txt excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, info= captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's gr.Number controls and their bounds, advanced params reaching the public endpoint by api_visibility with the /translate input order pinned by label, show_progress="minimal" on generation handlers, handler wiring, stable translate API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.

Tooling

When working with Python, invoke the relevant /astral:<skill> for uv, ty, and ruff to ensure best practices are followed.

  • uv β€” Python package manager. Used for venv creation and dependency installation. No pyproject.toml (HF Spaces requires requirements.txt). requirements.txt is the Spaces build manifest and omits gradio/spaces (provided by the Spaces runtime on every tier) and pins torch to a ZeroGPU-supported version; requirements-dev.txt adds gradio/spaces for local runs plus the dev tooling, so local setup installs both files.
  • Ruff β€” linter and formatter (ruff.toml). Rules: E, F, I, UP, W. Line length: 120.
  • ty β€” type checker (ty.toml). Python 3.12 target.
  • pytest β€” test runner (pytest.ini). Custom slow marker for CUDA-dependent tests.