Spaces:

darylalim
/

madlad-400-translate

Running on Zero

Daryl Lim Claude Opus 4.8 (1M context) commited on 2 days ago

Commit

fe1383b

1 Parent(s): e4538dd

test: cover temperature tolerance gate and token×beam cap through translate

Two behavioral changes from this session lacked direct tests:
- test_translate_tolerates_near_default_temperature: a temperature within ~1e-6
of 1.0 (float spinner drift) stays greedy, while a clearly different value
still samples.
- test_translate_applies_token_beam_cap: a high token×beam request reaches
model.generate with the capped token count (end-to-end through translate(),
not just _normalize_params in isolation).

71 fast pass. Docs synced to 81/71/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (3) hide show

CLAUDE.md +3 -3
README.md +2 -2
tests/test_app.py +19 -0

CLAUDE.md CHANGED Viewed

@@ -25,8 +25,8 @@ uv run ruff format .
 uv run ty check
 # Test
-uv run pytest                     # all 79 tests (slow require CUDA + model download)
-uv run pytest -m "not slow"       # 69 fast tests only
 uv run pytest -m slow             # 10 model tests only (CUDA only)
 # Generate language mapping (dev only)
@@ -41,7 +41,7 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
 **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
-**`tests/`** — 79 tests (69 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 59 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding, param forwarding into `generate`, `_normalize_params` None/NaN/clamp coercion and token×beam product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling

 uv run ty check
 # Test
+uv run pytest                     # all 81 tests (slow require CUDA + model download)
+uv run pytest -m "not slow"       # 71 fast tests only
 uv run pytest -m slow             # 10 model tests only (CUDA only)
 # Generate language mapping (dev only)
 **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
+**`tests/`** — 81 tests (71 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 61 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding with near-1.0 temperature tolerance, param forwarding into `generate` and the token×beam cap applied through `translate()`, `_normalize_params` None/NaN/clamp coercion and product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling

README.md CHANGED Viewed

@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
 uv run ruff check .             # lint
 uv run ruff format .            # format
 uv run ty check                 # type check
-uv run pytest -m "not slow"     # 69 fast tests
-uv run pytest                   # all 79 tests (slow require CUDA + model download)
 ```

 uv run ruff check .             # lint
 uv run ruff format .            # format
 uv run ty check                 # type check
+uv run pytest -m "not slow"     # 71 fast tests
+uv run pytest                   # all 81 tests (slow require CUDA + model download)
 ```

tests/test_app.py CHANGED Viewed

@@ -199,6 +199,25 @@ def test_translate_forwards_generation_params():
     assert "do_sample" not in kwargs, "beam search must not enable sampling"
 def test_normalize_params_clamps_and_defaults():
     """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
     import app

     assert "do_sample" not in kwargs, "beam search must not enable sampling"
+def test_translate_applies_token_beam_cap():
+    """A high token×beam request must reach model.generate with the capped token count (not the
+    raw value) so generation stays within its GPU reservation."""
+    import app
+    kwargs, _ = _run_translate("Hello", "French (fr)", max_new_tokens=1024, num_beams=8)
+    assert kwargs["num_beams"] == 8
+    assert kwargs["max_new_tokens"] * kwargs["num_beams"] <= app._MAX_TOKEN_BEAM_PRODUCT
+def test_translate_tolerates_near_default_temperature():
+    """A temperature within ~1e-6 of 1.0 (float spinner drift) stays greedy; a clearly different
+    value still samples."""
+    near_one, _ = _run_translate("Hello", "French (fr)", temperature=1.0 - 1e-9)
+    sampled, _ = _run_translate("Hello", "French (fr)", temperature=0.7)
+    assert "do_sample" not in near_one, "near-1.0 temperature should stay greedy"
+    assert sampled.get("do_sample") is True
 def test_normalize_params_clamps_and_defaults():
     """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
     import app