Spaces:
Running on Zero
fix: greedy-decode by default; apply review cleanups
Browse filesFrom a multi-agent review of the session's changes:
- translate(): greedy-decode by default. Previously num_beams==1 always
set do_sample=True, so the default/public /translate path sampled at
temperature 1.0 -> non-deterministic, lower-quality MT. Now sampling is
enabled only when the user sets a non-default temperature; beam search
still ignores it. New test locks both branches.
- tests: assert _estimate_duration mirrors translate()'s signature (the
ZeroGPU duration-callable contract); widen the requirements-exclusion
regex to also split on '[' and ',' so gradio[extra] can't slip through.
- _translate_with_loading: tighten return annotation to
Generator[tuple[object, object], None, None].
51 fast / 61 total. ruff + ty clean. Docs (CLAUDE.md, README) synced.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@@ -25,8 +25,8 @@ uv run ruff format .
|
|
| 25 |
uv run ty check
|
| 26 |
|
| 27 |
# Test
|
| 28 |
-
uv run pytest # all
|
| 29 |
-
uv run pytest -m "not slow" #
|
| 30 |
uv run pytest -m slow # 10 model tests only (CUDA only)
|
| 31 |
|
| 32 |
# Generate language mapping (dev only)
|
|
@@ -35,13 +35,13 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
|
|
| 35 |
|
| 36 |
## Architecture
|
| 37 |
|
| 38 |
-
**`app.py`** β Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("β") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` β fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens Γ num_beams` (capped at 120s). The submit handler exposes a stable `/translate` API endpoint; the swap and Translate-button handlers are `api_visibility="private"`. Only `/translate` is public.
|
| 39 |
|
| 40 |
**`langmap/`** β Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
|
| 41 |
|
| 42 |
**`scripts/`** β `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
|
| 43 |
|
| 44 |
-
**`tests/`** β
|
| 45 |
|
| 46 |
## Tooling
|
| 47 |
|
|
|
|
| 25 |
uv run ty check
|
| 26 |
|
| 27 |
# Test
|
| 28 |
+
uv run pytest # all 61 tests (slow require CUDA + model download)
|
| 29 |
+
uv run pytest -m "not slow" # 51 fast tests only
|
| 30 |
uv run pytest -m slow # 10 model tests only (CUDA only)
|
| 31 |
|
| 32 |
# Generate language mapping (dev only)
|
|
|
|
| 35 |
|
| 36 |
## Architecture
|
| 37 |
|
| 38 |
+
**`app.py`** β Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("β") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` β fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation. Decoding is greedy by default (deterministic); a non-default `temperature` enables sampling, and `num_beams > 1` uses beam search. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens Γ num_beams` (capped at 120s). The submit handler exposes a stable `/translate` API endpoint; the swap and Translate-button handlers are `api_visibility="private"`. Only `/translate` is public.
|
| 39 |
|
| 40 |
**`langmap/`** β Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
|
| 41 |
|
| 42 |
**`scripts/`** β `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
|
| 43 |
|
| 44 |
+
**`tests/`** β 61 tests (51 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 41 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract, greedy-by-default decoding, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, handler wiring, stable `translate` API endpoint with UI-only handlers kept private, no HTML elements, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
|
| 45 |
|
| 46 |
## Tooling
|
| 47 |
|
|
@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
|
|
| 39 |
uv run ruff check . # lint
|
| 40 |
uv run ruff format . # format
|
| 41 |
uv run ty check # type check
|
| 42 |
-
uv run pytest -m "not slow" #
|
| 43 |
-
uv run pytest # all
|
| 44 |
```
|
|
|
|
| 39 |
uv run ruff check . # lint
|
| 40 |
uv run ruff format . # format
|
| 41 |
uv run ty check # type check
|
| 42 |
+
uv run pytest -m "not slow" # 51 fast tests
|
| 43 |
+
uv run pytest # all 61 tests (slow require CUDA + model download)
|
| 44 |
```
|
|
@@ -116,7 +116,9 @@ def translate(
|
|
| 116 |
input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
|
| 117 |
|
| 118 |
generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
|
| 119 |
-
|
|
|
|
|
|
|
| 120 |
generate_kwargs["do_sample"] = True
|
| 121 |
generate_kwargs["temperature"] = temperature
|
| 122 |
|
|
@@ -135,7 +137,7 @@ def translate(
|
|
| 135 |
def _translate_with_loading(
|
| 136 |
text: str,
|
| 137 |
target_language_name: str,
|
| 138 |
-
) -> Generator[tuple, None, None]:
|
| 139 |
yield gr.update(value="Translating...", interactive=False), gr.update()
|
| 140 |
result = translate(text, target_language_name)
|
| 141 |
yield gr.update(value="Translate", interactive=True), result
|
|
|
|
| 116 |
input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
|
| 117 |
|
| 118 |
generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
|
| 119 |
+
# Greedy by default (deterministic, higher-quality MT). Only sample when the user
|
| 120 |
+
# explicitly sets a non-default temperature; beam search (num_beams > 1) ignores it.
|
| 121 |
+
if num_beams == 1 and temperature != 1.0:
|
| 122 |
generate_kwargs["do_sample"] = True
|
| 123 |
generate_kwargs["temperature"] = temperature
|
| 124 |
|
|
|
|
| 137 |
def _translate_with_loading(
|
| 138 |
text: str,
|
| 139 |
target_language_name: str,
|
| 140 |
+
) -> Generator[tuple[object, object], None, None]:
|
| 141 |
yield gr.update(value="Translating...", interactive=False), gr.update()
|
| 142 |
result = translate(text, target_language_name)
|
| 143 |
yield gr.update(value="Translate", interactive=True), result
|
|
@@ -136,6 +136,43 @@ def test_estimate_duration_is_input_aware_and_capped():
|
|
| 136 |
assert all(isinstance(d, int) for d in (small, default, heavy))
|
| 137 |
|
| 138 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
def test_requirements_excludes_platform_packages():
|
| 140 |
"""gradio and spaces are provided by the HF Spaces runtime on every tier; pinning them in
|
| 141 |
requirements.txt drifts the ZeroGPU runtime, so they must stay out (see requirements-dev.txt)."""
|
|
@@ -144,7 +181,7 @@ def test_requirements_excludes_platform_packages():
|
|
| 144 |
|
| 145 |
reqs = Path(__file__).resolve().parent.parent / "requirements.txt"
|
| 146 |
names = {
|
| 147 |
-
re.split(r"[<>=~!;\s]", line.strip())[0].lower()
|
| 148 |
for line in reqs.read_text().splitlines()
|
| 149 |
if line.strip() and not line.strip().startswith("#")
|
| 150 |
}
|
|
|
|
| 136 |
assert all(isinstance(d, int) for d in (small, default, heavy))
|
| 137 |
|
| 138 |
|
| 139 |
+
def test_estimate_duration_mirrors_translate_signature():
|
| 140 |
+
"""ZeroGPU calls the duration callable with translate()'s exact args, so _estimate_duration
|
| 141 |
+
must mirror translate()'s parameter names and order (the load-bearing ZeroGPU contract)."""
|
| 142 |
+
import inspect
|
| 143 |
+
|
| 144 |
+
import app
|
| 145 |
+
|
| 146 |
+
assert list(inspect.signature(app._estimate_duration).parameters) == list(
|
| 147 |
+
inspect.signature(app.translate).parameters
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
def test_translate_greedy_by_default_samples_on_custom_temperature():
|
| 152 |
+
"""num_beams=1 should greedy-decode at the default temperature (deterministic) and enable
|
| 153 |
+
sampling only when the user sets a non-default temperature."""
|
| 154 |
+
import app
|
| 155 |
+
|
| 156 |
+
def run(temperature):
|
| 157 |
+
model = MagicMock()
|
| 158 |
+
model.device = torch.device("cpu")
|
| 159 |
+
model.generate.return_value = [[0]]
|
| 160 |
+
tokenizer = MagicMock()
|
| 161 |
+
tokenizer.decode.return_value = "out"
|
| 162 |
+
with (
|
| 163 |
+
patch("app._load_model", return_value=model),
|
| 164 |
+
patch("app._load_tokenizer", return_value=tokenizer),
|
| 165 |
+
patch("app._build_language_mappings", return_value=({"French (fr)": "<2fr>"}, ["French (fr)"])),
|
| 166 |
+
):
|
| 167 |
+
app.translate("Hello", "French (fr)", temperature=temperature)
|
| 168 |
+
return model.generate.call_args.kwargs
|
| 169 |
+
|
| 170 |
+
greedy = run(1.0)
|
| 171 |
+
sampled = run(0.5)
|
| 172 |
+
assert not greedy.get("do_sample", False), "default temperature should greedy-decode"
|
| 173 |
+
assert sampled.get("do_sample") is True and sampled["temperature"] == 0.5
|
| 174 |
+
|
| 175 |
+
|
| 176 |
def test_requirements_excludes_platform_packages():
|
| 177 |
"""gradio and spaces are provided by the HF Spaces runtime on every tier; pinning them in
|
| 178 |
requirements.txt drifts the ZeroGPU runtime, so they must stay out (see requirements-dev.txt)."""
|
|
|
|
| 181 |
|
| 182 |
reqs = Path(__file__).resolve().parent.parent / "requirements.txt"
|
| 183 |
names = {
|
| 184 |
+
re.split(r"[<>=~!;\[\s,]", line.strip())[0].lower()
|
| 185 |
for line in reqs.read_text().splitlines()
|
| 186 |
if line.strip() and not line.strip().startswith("#")
|
| 187 |
}
|