Spaces:

darylalim
/

madlad-400-translate

Running on Zero

Daryl Lim Claude Opus 4.8 (1M context) commited on 4 days ago

Commit

e45a74c

1 Parent(s): 23bd770

fix: harden generation params, fix swap RTL, polish from review

Address the adversarial self-review of the UI/API changes.

High — None/NaN/null safety on the now-public /translate:
- Add _normalize_params (None/NaN -> default, clamp to range) as the single
funnel; call it in translate() and in _estimate_duration (which ZeroGPU runs
*before* translate with the same uncast args). A cleared gr.Number arrives as
None and the public submit path passes values uncast, so without this a single
empty Advanced field crashed the endpoint and the duration callable.
- Make the empty-input guard None-safe: `not (text or "").strip()`.
- Drop the now-redundant int()/float() casts in _translate_with_loading.
- Type the numeric params as int|None / float|None to reflect Gradio's nulls.

Medium — swap-button stale RTL:
- _swap_languages now emits gr.update(rtl/text_align) for both textboxes so
direction follows the swapped text (rtl is sticky; a prior RTL flip must reset).

Low polish:
- Gate sampling on abs(temperature - 1.0) > 1e-6 to absorb float spinner drift.
- Reword the temperature caption to describe both directions.
- Document the Ctrl+Enter-vs-button RTL divergence in the submit comment.

Tests (68 fast, +10): param forwarding into generate, _normalize_params
clamp/None/NaN, translate None/NaN coercion, _estimate_duration None-safety,
empty/None parametrized guard, swap RTL, RTL_CODES ⊆ langmap, gr.Number bounds,
/translate input order pinned by label, public-path assertion keyed on
api_visibility, textbox caption content, RTL LTR-branch value. Docs synced to
78/68/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (4) hide show

CLAUDE.md +4 -4
README.md +2 -2
app.py +68 -25
tests/test_app.py +117 -9

CLAUDE.md CHANGED Viewed

@@ -25,8 +25,8 @@ uv run ruff format .
 uv run ty check
 # Test
-uv run pytest                     # all 68 tests (slow require CUDA + model download)
-uv run pytest -m "not slow"       # 58 fast tests only
 uv run pytest -m slow             # 10 model tests only (CUDA only)
 # Generate language mapping (dev only)
@@ -35,13 +35,13 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
 ## Architecture
-**`app.py`** — Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an `info=` caption discloses. Each control carries an `info=` caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` — fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation; whitespace-only input short-circuits to an empty string before the model loads. Decoding is greedy by default (deterministic); a non-default `temperature` enables sampling, and `num_beams > 1` uses beam search. A collapsed "Advanced" accordion exposes `max_new_tokens`/`num_beams`/`temperature` as `gr.Number` controls (no sliders; defaults mirror `translate()`, so the default surface stays greedy). Right-to-left target scripts (an explicit `RTL_CODES` token set — `region` is not a usable proxy) flip the output box to RTL via the Translate-button path. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens × num_beams` (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the `/translate` API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable `/translate` API endpoint (returns a bare string); the swap and Translate-button handlers are `api_visibility="private"`, and both generation handlers use `show_progress="minimal"`. Only `/translate` is public.
 **`langmap/`** — Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
 **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
-**`tests/`** — 68 tests (58 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 48 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract, greedy-by-default decoding, whitespace-input short-circuit, RTL output direction on the button path, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes, the Advanced accordion's `gr.Number` controls wired into both translate handlers, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling

 uv run ty check
 # Test
+uv run pytest                     # all 78 tests (slow require CUDA + model download)
+uv run pytest -m "not slow"       # 68 fast tests only
 uv run pytest -m slow             # 10 model tests only (CUDA only)
 # Generate language mapping (dev only)
 ## Architecture
+**`app.py`** — Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an `info=` caption discloses. Each control carries an `info=` caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` — fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation; whitespace-only or `None` input short-circuits to an empty string before the model loads. The generation params are normalized in `translate()` via `_normalize_params` (`None`/`NaN` → default, then clamped to range) so the cast-less public path and the ZeroGPU duration callable can't crash on a cleared `gr.Number` field. Decoding is greedy by default (deterministic); a non-default `temperature` (tolerance-compared to absorb float spinner drift) enables sampling, and `num_beams > 1` uses beam search. A collapsed "Advanced" accordion exposes `max_new_tokens`/`num_beams`/`temperature` as `gr.Number` controls (no sliders; defaults mirror `translate()`, so the default surface stays greedy). Right-to-left target scripts (an explicit `RTL_CODES` token set — `region` is not a usable proxy) flip the output box to RTL via the Translate-button and swap paths; Ctrl+Enter/`/translate` return a bare string and stay LTR. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens × num_beams` (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the `/translate` API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable `/translate` API endpoint (returns a bare string); the swap and Translate-button handlers are `api_visibility="private"`, and both generation handlers use `show_progress="minimal"`. Only `/translate` is public.
 **`langmap/`** — Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
 **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
+**`tests/`** — 78 tests (68 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 58 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding, param forwarding into `generate`, `_normalize_params` None/NaN/clamp coercion, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
 ## Tooling

README.md CHANGED Viewed

@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
 uv run ruff check .             # lint
 uv run ruff format .            # format
 uv run ty check                 # type check
-uv run pytest -m "not slow"     # 58 fast tests
-uv run pytest                   # all 68 tests (slow require CUDA + model download)
 ```

 uv run ruff check .             # lint
 uv run ruff format .            # format
 uv run ty check                 # type check
+uv run pytest -m "not slow"     # 68 fast tests
+uv run pytest                   # all 78 tests (slow require CUDA + model download)
 ```

app.py CHANGED Viewed

@@ -3,6 +3,7 @@ Translation interface using the MADLAD-400 3B model.
 Translates between 418 languages from the MADLAD-400 paper.
 """
 import os
 import time
 import warnings
@@ -105,19 +106,40 @@ def _maybe_eager_load() -> None:
         _load_model()
 def _estimate_duration(
     text: str,
     target_language_name: str,
-    max_new_tokens: int = 512,
-    num_beams: int = 1,
-    temperature: float = 1.0,
 ) -> int:
-    """Reserve GPU time scaled to the worst case: generation cost grows with the
-    number of tokens generated and the beam width. Mirrors translate()'s signature
-    (ZeroGPU calls the duration callable with the decorated function's args).
-    Conservative and capped at 120s; calibrate from the perf_counter log in
-    translate() (zerogpu.md 'Sizing duration')."""
-    del text, target_language_name, temperature  # only token/beam counts drive runtime
     return min(120, 30 + (max_new_tokens * num_beams) // 8)
@@ -125,16 +147,24 @@ def _estimate_duration(
 def translate(
     text: str,
     target_language_name: str,
-    max_new_tokens: int = 512,
-    num_beams: int = 1,
-    temperature: float = 1.0,
 ) -> str:
     # No-op on empty/whitespace input: skip the model entirely rather than feeding a bare
     # "<2xx> " prompt (which would burn generation time and emit a stray token). Guard lives
     # here, not in _translate_with_loading, so the public /translate and Ctrl+Enter paths
-    # (which call translate() directly) are covered too. Returns a str, so the contract holds.
-    if not text.strip():
         return ""
     tokenizer = _load_tokenizer()
     model = _load_model()
     device = model.device
@@ -144,7 +174,7 @@ def translate(
     if target_code is None:
         raise ValueError(f"Unsupported language: {target_language_name}")
-    if num_beams > 1 and temperature != 1.0:
         gr.Info("Temperature has no effect when beam search is enabled (num_beams > 1).")
     input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
@@ -152,7 +182,7 @@ def translate(
     generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
     # Greedy by default (deterministic, higher-quality MT). Only sample when the user
     # explicitly sets a non-default temperature; beam search (num_beams > 1) ignores it.
-    if num_beams == 1 and temperature != 1.0:
         generate_kwargs["do_sample"] = True
         generate_kwargs["temperature"] = temperature
@@ -171,12 +201,13 @@ def translate(
 def _translate_with_loading(
     text: str,
     target_language_name: str,
-    max_new_tokens: int = 512,
-    num_beams: int = 1,
-    temperature: float = 1.0,
 ) -> Generator[tuple[object, object], None, None]:
     yield gr.update(value="Translating...", interactive=False), gr.update()
-    result = translate(text, target_language_name, int(max_new_tokens), int(num_beams), float(temperature))
     # Flip the output box to RTL for right-to-left target scripts so the translation reads
     # correctly; reset to LTR otherwise (rtl is sticky across reruns). Only the button path
     # carries this — the public /translate endpoint stays a bare str to keep its API stable.
@@ -188,9 +219,20 @@ def _translate_with_loading(
 def _swap_languages(
     source_lang: str, target_lang: str, source_text: str, target_text: str
-) -> tuple[str, str, str, str]:
-    """Swap source/target languages and their text."""
-    return target_lang, source_lang, target_text, source_text
 def _build_demo() -> gr.Blocks:
@@ -262,7 +304,7 @@ def _build_demo() -> gr.Blocks:
                 minimum=0.1,
                 maximum=2.0,
                 step=0.1,
-                info="No effect at 1.0 (greedy) or when Beams > 1; lower samples less randomly.",
             )
         # UI-only handlers: kept off the public API surface (private) so only /translate is exposed.
@@ -284,7 +326,8 @@ def _build_demo() -> gr.Blocks:
         # /translate exposes the advanced params too. They all have defaults, so existing
         # two-arg callers (text, target) keep working; wiring them here also makes Ctrl+Enter
         # honor the Advanced accordion, matching the Translate button. The endpoint returns a
-        # bare str (RTL direction is a UI-only concern handled on the button path).
         input_text.submit(
             fn=translate,
             inputs=[input_text, target_language, max_new_tokens, num_beams, temperature],

 Translates between 418 languages from the MADLAD-400 paper.
 """
+import math
 import os
 import time
 import warnings
         _load_model()
+def _normalize_params(
+    max_new_tokens: float | None, num_beams: float | None, temperature: float | None
+) -> tuple[int, int, float]:
+    """Coerce the advanced generation params to safe values. A cleared ``gr.Number`` arrives as
+    ``None`` (Gradio skips its bounds check for ``None``) and the public ``/translate`` path
+    passes values uncast, so this is the single funnel every caller — button, submit, API,
+    direct, and the ZeroGPU duration callable — goes through. ``None``/``NaN`` fall back to the
+    defaults; values are clamped to the ranges the Advanced ``gr.Number`` controls advertise."""
+    def _num(value: float | None, default: float) -> float:
+        return default if value is None or math.isnan(value) else value
+    return (
+        int(max(1, min(1024, _num(max_new_tokens, 512)))),
+        int(max(1, min(8, _num(num_beams, 1)))),
+        float(max(0.1, min(2.0, _num(temperature, 1.0)))),
+    )
 def _estimate_duration(
     text: str,
     target_language_name: str,
+    max_new_tokens: int | None = 512,
+    num_beams: int | None = 1,
+    temperature: float | None = 1.0,
 ) -> int:
+    """Reserve GPU time scaled to the worst case: generation cost grows with the number of
+    tokens generated and the beam width. Mirrors translate()'s signature (ZeroGPU calls the
+    duration callable with the decorated function's args, and runs it *before* translate(), so
+    it must tolerate the same cleared-field ``None`` values — normalize first). Conservative and
+    capped at 120s; calibrate from the perf_counter log in translate() (zerogpu.md 'Sizing
+    duration')."""
+    del text, target_language_name  # only token/beam counts drive runtime
+    max_new_tokens, num_beams, _ = _normalize_params(max_new_tokens, num_beams, temperature)
     return min(120, 30 + (max_new_tokens * num_beams) // 8)
 def translate(
     text: str,
     target_language_name: str,
+    max_new_tokens: int | None = 512,
+    num_beams: int | None = 1,
+    temperature: float | None = 1.0,
 ) -> str:
     # No-op on empty/whitespace input: skip the model entirely rather than feeding a bare
     # "<2xx> " prompt (which would burn generation time and emit a stray token). Guard lives
     # here, not in _translate_with_loading, so the public /translate and Ctrl+Enter paths
+    # (which call translate() directly) are covered too. (text or "") stays None-safe for an
+    # API caller that POSTs a null text field. Returns a str, so the contract holds.
+    if not (text or "").strip():
         return ""
+    # Normalize the generation params here — translate() is the single source of truth. The
+    # public submit path passes gr.Number values uncast, and a cleared field arrives as
+    # None/NaN, so coerce and clamp before use (the duration callable normalizes identically).
+    max_new_tokens, num_beams, temperature = _normalize_params(max_new_tokens, num_beams, temperature)
+    # Compare with a tolerance so float spinner drift (e.g. 0.1*9 = 0.999…) doesn't trip sampling.
+    sampling = abs(temperature - 1.0) > 1e-6
     tokenizer = _load_tokenizer()
     model = _load_model()
     device = model.device
     if target_code is None:
         raise ValueError(f"Unsupported language: {target_language_name}")
+    if num_beams > 1 and sampling:
         gr.Info("Temperature has no effect when beam search is enabled (num_beams > 1).")
     input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
     generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
     # Greedy by default (deterministic, higher-quality MT). Only sample when the user
     # explicitly sets a non-default temperature; beam search (num_beams > 1) ignores it.
+    if num_beams == 1 and sampling:
         generate_kwargs["do_sample"] = True
         generate_kwargs["temperature"] = temperature
 def _translate_with_loading(
     text: str,
     target_language_name: str,
+    max_new_tokens: int | None = 512,
+    num_beams: int | None = 1,
+    temperature: float | None = 1.0,
 ) -> Generator[tuple[object, object], None, None]:
     yield gr.update(value="Translating...", interactive=False), gr.update()
+    # translate() normalizes the params (None/NaN/clamp), so forward them as-is.
+    result = translate(text, target_language_name, max_new_tokens, num_beams, temperature)
     # Flip the output box to RTL for right-to-left target scripts so the translation reads
     # correctly; reset to LTR otherwise (rtl is sticky across reruns). Only the button path
     # carries this — the public /translate endpoint stays a bare str to keep its API stable.
 def _swap_languages(
     source_lang: str, target_lang: str, source_text: str, target_text: str
+) -> tuple[str, str, object, object]:
+    """Swap source/target languages and their text, flipping each textbox's direction to follow
+    the text that lands in it. rtl is sticky across reruns, so a stale RTL flip left by a prior
+    translation must be reset. After the swap the input box holds the old target text and the
+    output box holds the old source text."""
+    name_to_code, _ = _build_language_mappings()
+    input_rtl = name_to_code.get(target_lang) in RTL_CODES  # old target text moves into the input box
+    output_rtl = name_to_code.get(source_lang) in RTL_CODES  # old source text moves into the output box
+    return (
+        target_lang,
+        source_lang,
+        gr.update(value=target_text, rtl=input_rtl, text_align="right" if input_rtl else "left"),
+        gr.update(value=source_text, rtl=output_rtl, text_align="right" if output_rtl else "left"),
+    )
 def _build_demo() -> gr.Blocks:
                 minimum=0.1,
                 maximum=2.0,
                 step=0.1,
+                info="No effect at 1.0 (greedy) or when Beams > 1; below 1.0 is more focused, above 1.0 more random.",
             )
         # UI-only handlers: kept off the public API surface (private) so only /translate is exposed.
         # /translate exposes the advanced params too. They all have defaults, so existing
         # two-arg callers (text, target) keep working; wiring them here also makes Ctrl+Enter
         # honor the Advanced accordion, matching the Translate button. The endpoint returns a
+        # bare str, so an RTL target submitted via Ctrl+Enter is NOT direction-flipped — that
+        # happens only on the Translate-button path (an accepted, documented UI divergence).
         input_text.submit(
             fn=translate,
             inputs=[input_text, target_language, max_new_tokens, num_beams, temperature],

tests/test_app.py CHANGED Viewed

@@ -173,20 +173,99 @@ def test_translate_greedy_by_default_samples_on_custom_temperature():
     assert sampled.get("do_sample") is True and sampled["temperature"] == 0.5
-def test_translate_skips_model_on_empty_input():
-    """Whitespace-only input should short-circuit to '' without loading or running the model."""
     import app
     with (
         patch("app._load_model") as load_model,
         patch("app._load_tokenizer") as load_tokenizer,
     ):
-        result = app.translate("   ", "French (fr)")
     assert result == ""
     load_model.assert_not_called()
     load_tokenizer.assert_not_called()
 def test_translate_with_loading_flips_rtl_for_rtl_target():
     """The private button path marks the output RTL for right-to-left target languages and
     resets to LTR otherwise (rtl is sticky across reruns)."""
@@ -204,7 +283,17 @@ def test_translate_with_loading_flips_rtl_for_rtl_target():
     ltr = final_output("French (fr)", "<2fr>")
     assert rtl["rtl"] is True and rtl["text_align"] == "right"
     assert ltr["rtl"] is False and ltr["text_align"] == "left"
-    assert rtl["value"] == "out"
 def test_requirements_excludes_platform_packages():
@@ -292,9 +381,13 @@ def test_dropdowns_have_info_captions(demo):
 def test_textboxes_have_info_captions(demo):
-    """Input and output textboxes carry info= captions (Ctrl+Enter hint and model provenance)."""
     textboxes = [b for b in demo.blocks.values() if type(b).__name__ == "Textbox"]
     assert all(t.info for t in textboxes), "input and output textboxes should carry info captions"
 def test_demo_has_two_textboxes(demo):
@@ -416,18 +509,30 @@ def test_advanced_params_are_numbers(demo):
     assert len(numbers) == 3, f"Expected 3 Number controls, found {len(numbers)}"
 def test_advanced_params_wired_to_both_translate_handlers(demo):
     """Both translate handlers (button click + public submit) carry the three advanced Number
-    params after text + language, so Ctrl+Enter and the /translate API honor the accordion too."""
     full_input_fns = [
         fn
         for fn in demo.fns.values()
         if [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
     ]
     assert len(full_input_fns) == 2, "Expected both translate handlers to carry the 3 advanced params"
-    assert any(getattr(fn, "api_name", None) == "translate" for fn in full_input_fns), (
-        "the public /translate endpoint must carry the advanced params"
-    )
 def test_all_handlers_wired(demo):
@@ -450,6 +555,9 @@ def test_translate_endpoint_has_stable_api_name(demo):
     assert len(api_fns) == 1, "Expected exactly one handler with api_name='translate'"
     fn = api_fns[0]
     assert [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
     assert [type(o).__name__ for o in fn.outputs] == ["Textbox"]

     assert sampled.get("do_sample") is True and sampled["temperature"] == 0.5
+def _run_translate(text, target, **kwargs):
+    """Call translate() against a mocked model/tokenizer and return generate()'s kwargs + result."""
+    import app
+    model = MagicMock()
+    model.device = torch.device("cpu")
+    model.generate.return_value = [[0]]
+    tokenizer = MagicMock()
+    tokenizer.decode.return_value = "out"
+    with (
+        patch("app._load_model", return_value=model),
+        patch("app._load_tokenizer", return_value=tokenizer),
+        patch("app._build_language_mappings", return_value=({"French (fr)": "<2fr>"}, ["French (fr)"])),
+    ):
+        result = app.translate(text, target, **kwargs)
+    return model.generate.call_args.kwargs if model.generate.called else None, result
+def test_translate_forwards_generation_params():
+    """Non-default max_new_tokens/num_beams must reach model.generate; beam search must not sample."""
+    kwargs, _ = _run_translate("Hello", "French (fr)", max_new_tokens=10, num_beams=4)
+    assert kwargs["max_new_tokens"] == 10
+    assert kwargs["num_beams"] == 4
+    assert "do_sample" not in kwargs, "beam search must not enable sampling"
+def test_normalize_params_clamps_and_defaults():
+    """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
+    import app
+    assert app._normalize_params(None, None, None) == (512, 1, 1.0)
+    nan = float("nan")
+    assert app._normalize_params(nan, nan, nan) == (512, 1, 1.0)
+    assert app._normalize_params(99999, 99, 9.0) == (1024, 8, 2.0)  # clamp high
+    assert app._normalize_params(0, 0, 0.0) == (1, 1, 0.1)  # clamp low
+    mnt, beams, temp = app._normalize_params(10.0, 4.0, 0.5)
+    assert (mnt, beams, temp) == (10, 4, 0.5)
+    assert type(mnt) is int and type(beams) is int and type(temp) is float
+def test_translate_normalizes_invalid_params():
+    """A cleared gr.Number arrives as None (and temperature can be NaN) on the public path;
+    translate() must coerce to defaults instead of crashing or corrupting sampling."""
+    kwargs, result = _run_translate(
+        "Hello", "French (fr)", max_new_tokens=None, num_beams=None, temperature=float("nan")
+    )
+    assert result == "out"
+    assert kwargs["max_new_tokens"] == 512 and kwargs["num_beams"] == 1
+    assert "do_sample" not in kwargs, "NaN temperature must fall back to greedy"
+def test_estimate_duration_handles_none_params():
+    """The ZeroGPU duration callable runs before translate() with the same uncast args, so a
+    cleared gr.Number (None) must not crash it."""
+    import app
+    assert isinstance(app._estimate_duration("hi", "French (fr)", None, None, None), int)
+@pytest.mark.parametrize("blank", ["", "   ", "\n\t", None])
+def test_translate_skips_model_on_empty_input(blank):
+    """Empty/whitespace/None input short-circuits to '' without loading or running the model."""
     import app
     with (
         patch("app._load_model") as load_model,
         patch("app._load_tokenizer") as load_tokenizer,
     ):
+        result = app.translate(blank, "French (fr)")
     assert result == ""
     load_model.assert_not_called()
     load_tokenizer.assert_not_called()
+def test_swap_flips_rtl_to_follow_text():
+    """Swapping must move each textbox's direction with the text: after EN->Arabic then swap,
+    the input box (now holding the Arabic translation) goes RTL and the output box (now holding
+    the English source) resets to LTR."""
+    import gradio as gr
+    import app
+    name_to_code = {"English (en)": "<2en>", "Arabic (ar)": "<2ar>"}
+    with patch("app._build_language_mappings", return_value=(name_to_code, list(name_to_code))):
+        new_source, new_target, input_update, output_update = app._swap_languages(
+            "English (en)", "Arabic (ar)", "Hello", "RTL-text"
+        )
+    assert (new_source, new_target) == ("Arabic (ar)", "English (en)")
+    # input box now holds the Arabic translation -> RTL; output box holds the English source -> LTR
+    assert input_update == gr.update(value="RTL-text", rtl=True, text_align="right")
+    assert output_update == gr.update(value="Hello", rtl=False, text_align="left")
 def test_translate_with_loading_flips_rtl_for_rtl_target():
     """The private button path marks the output RTL for right-to-left target languages and
     resets to LTR otherwise (rtl is sticky across reruns)."""
     ltr = final_output("French (fr)", "<2fr>")
     assert rtl["rtl"] is True and rtl["text_align"] == "right"
     assert ltr["rtl"] is False and ltr["text_align"] == "left"
+    assert rtl["value"] == "out" and ltr["value"] == "out"  # both branches forward the result
+def test_rtl_codes_are_valid_langmap_tokens():
+    """Every RTL_CODES token must exist in the langmap, so a langmap regeneration that renames
+    or drops a token can't silently disable an RTL flip without failing this test."""
+    import app
+    from langmap.langid_mapping import langid_to_language
+    missing = app.RTL_CODES - set(langid_to_language)
+    assert not missing, f"RTL_CODES not in langmap: {missing}"
 def test_requirements_excludes_platform_packages():
 def test_textboxes_have_info_captions(demo):
+    """Input box carries the Ctrl+Enter hint; output box carries model/arXiv/license provenance."""
     textboxes = [b for b in demo.blocks.values() if type(b).__name__ == "Textbox"]
     assert all(t.info for t in textboxes), "input and output textboxes should carry info captions"
+    input_box = next(t for t in textboxes if t.interactive is not False)
+    output_box = next(t for t in textboxes if t.interactive is False)
+    assert "ctrl+enter" in input_box.info.lower()
+    assert "madlad400-3b-mt" in output_box.info
 def test_demo_has_two_textboxes(demo):
     assert len(numbers) == 3, f"Expected 3 Number controls, found {len(numbers)}"
+def test_advanced_params_have_safe_bounds(demo):
+    """The Number controls must keep their documented bounds — for the public /translate path,
+    Gradio's component preprocess is the server-side guard keeping params in range."""
+    numbers = {n.label: n for n in demo.blocks.values() if type(n).__name__ == "Number"}
+    assert numbers["Max new tokens"].minimum == 1 and numbers["Max new tokens"].maximum == 1024
+    assert numbers["Max new tokens"].precision == 0
+    assert numbers["Beams"].minimum == 1 and numbers["Beams"].maximum == 8
+    assert numbers["Beams"].precision == 0
+    assert numbers["Temperature"].minimum == 0.1 and numbers["Temperature"].maximum == 2.0
 def test_advanced_params_wired_to_both_translate_handlers(demo):
     """Both translate handlers (button click + public submit) carry the three advanced Number
+    params after text + language. Exactly one of them is the public /translate endpoint, so the
+    params demonstrably reach the public path (keyed on api_visibility, not a name coincidence)."""
     full_input_fns = [
         fn
         for fn in demo.fns.values()
         if [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
     ]
     assert len(full_input_fns) == 2, "Expected both translate handlers to carry the 3 advanced params"
+    public = [fn for fn in full_input_fns if getattr(fn, "api_visibility", None) == "public"]
+    assert len(public) == 1, "exactly one full-input handler should be the public endpoint"
+    assert getattr(public[0], "api_name", None) == "translate", "the public one must be /translate"
 def test_all_handlers_wired(demo):
     assert len(api_fns) == 1, "Expected exactly one handler with api_name='translate'"
     fn = api_fns[0]
     assert [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
+    # the three Number inputs are positionally indistinguishable by type, so pin their order by
+    # label — a num_beams/temperature swap in the inputs= list would otherwise pass silently.
+    assert [i.label for i in fn.inputs[2:]] == ["Max new tokens", "Beams", "Temperature"]
     assert [type(o).__name__ for o in fn.outputs] == ["Textbox"]