Daryl Lim
docs: update CLAUDE.md for Google Translate-style UI redesign
a52a55e
|
Raw
History Blame
2.63 kB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

A Hugging Face Spaces app that translates English text to 22 production-ready languages (BLEU 35+ both directions, Section A.10) from Google's MADLAD-400 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no CUDA GPU is available.

Commands

# Setup
uv venv --python 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

# Run (launches on http://localhost:7860)
python app.py

# Lint and format
ruff check .
ruff format .

# Type check
ty check

# Test
pytest                     # all tests (slow tests require CUDA + model download)
pytest -m "not slow"       # fast tests only
pytest -m slow             # model tests only (CUDA only)

Architecture

app.py β€” Single-file application with a Google Translate-style layout: a centered language bar ("English β†’ [Target]") above side-by-side input/output text areas and a translate button. Uses @lru_cache for lazy loading of the google/madlad400-3b-mt tokenizer and model (no download on import). Uses float16 on CUDA, float32 on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a language token with a space to the input text (e.g., <2fr> Hello) before tokenization and generation. The @spaces.GPU decorator allocates GPU on HF Spaces infrastructure.

langmap/ β€” Package with langid_mapping.py, a hand-maintained dictionary mapping 22 Tier 1 production-ready language tokens (BLEU 35+ both directions, Section A.10) to human-readable language names. Available languages at runtime are the intersection of this mapping and the model's vocabulary.

tests/ β€” 31 tests (21 fast, 10 slow). test_langmap.py has 8 fast tests for language mapping validation. test_app.py has 13 fast tests (signatures, device fallback, UI layout) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.

Tooling

  • uv β€” Python package manager. Used for venv creation and dependency installation from requirements.txt. No pyproject.toml; requirements.txt remains the single source of truth (required by HF Spaces).
  • Ruff β€” linter and formatter (ruff.toml). Rules: E, F, I, W. Line length: 120.
  • ty β€” type checker (ty.toml). Python 3.12 target.
  • pytest β€” test runner (pytest.ini). Custom slow marker for CUDA-dependent tests.