File size: 2,014 Bytes
f50ef54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

A Hugging Face Spaces app that translates English text to ~400 languages using Google's [MADLAD-400](https://arxiv.org/pdf/2309.04662) 3B Seq2Seq model. Built with Gradio and deployed on HF Spaces. Falls back to CPU with a warning when no GPU is available.

## Commands

```bash
# Setup
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run (launches on http://localhost:7860)
python app.py

# Lint and format
ruff check .
ruff format .

# Type check
ty check

# Test
pytest                     # all tests (slow tests require GPU + model download)
pytest -m "not slow"       # fast tests only
pytest -m slow             # model tests only
```

## Architecture

**`app.py`** β€” Single-file application. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model with `float16` precision (no download on import). Translation prepends a language token to the input text (e.g., `<2fr>Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure.

**`langmap/`** β€” Package with `langid_mapping.py`, a hand-maintained dictionary mapping ~400 language tokens to human-readable language names (sourced from pages 16–21 of the MADLAD-400 paper). Available languages at runtime are the intersection of this mapping and the model's vocabulary.

**`tests/`** β€” Pytest suite split into fast (`test_langmap.py`) and slow (`test_app.py`). Slow tests require GPU and model download; they are auto-skipped without MPS/CUDA. Fast tests in `test_app.py` verify the module imports without triggering model download.

## Tooling

- **Ruff** β€” linter and formatter (`ruff.toml`). Rules: `E`, `F`, `I`, `W`. Line length: 120.
- **ty** β€” type checker (`ty.toml`). Python 3.12 target.
- **pytest** β€” test runner (`pytest.ini`). Custom `slow` marker for GPU-dependent tests.