Spaces:

darylalim
/

madlad-400-translate

Running on Zero

Daryl Lim commited on Apr 8

Commit

a52a55e

1 Parent(s): 90fc0f4

docs: update CLAUDE.md for Google Translate-style UI redesign

Files changed (1) hide show

CLAUDE.md CHANGED Viewed

@@ -32,11 +32,11 @@ pytest -m slow             # model tests only (CUDA only)
 ## Architecture
-**`app.py`** — Single-file application with two-column layout: a settings sidebar (target language + generation parameters) and a tabbed main area (Single text / Batch file translation). Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model (no download on import). Uses `float16` on CUDA, `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure. Batch mode accepts `.txt` (one sentence per line) and `.csv` (requires `text` column) files.
 **`langmap/`** — Package with `langid_mapping.py`, a hand-maintained dictionary mapping 22 Tier 1 production-ready language tokens (BLEU 35+ both directions, Section A.10) to human-readable language names. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
-**`tests/`** — 55 tests (45 fast, 10 slow). `test_langmap.py` has 8 fast tests for language mapping validation. `test_app.py` has 37 fast tests (signatures, device fallback, batch parsing, output writing, guard clauses, UI config) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling

 ## Architecture
+**`app.py`** — Single-file application with a Google Translate-style layout: a centered language bar ("English → [Target]") above side-by-side input/output text areas and a translate button. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model (no download on import). Uses `float16` on CUDA, `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure.
 **`langmap/`** — Package with `langid_mapping.py`, a hand-maintained dictionary mapping 22 Tier 1 production-ready language tokens (BLEU 35+ both directions, Section A.10) to human-readable language names. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
+**`tests/`** — 31 tests (21 fast, 10 slow). `test_langmap.py` has 8 fast tests for language mapping validation. `test_app.py` has 13 fast tests (signatures, device fallback, UI layout) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling