Daryl Lim Claude Opus 4.8 (1M context) commited on
Commit
e45a74c
·
1 Parent(s): 23bd770

fix: harden generation params, fix swap RTL, polish from review

Browse files

Address the adversarial self-review of the UI/API changes.

High — None/NaN/null safety on the now-public /translate:
- Add _normalize_params (None/NaN -> default, clamp to range) as the single
funnel; call it in translate() and in _estimate_duration (which ZeroGPU runs
*before* translate with the same uncast args). A cleared gr.Number arrives as
None and the public submit path passes values uncast, so without this a single
empty Advanced field crashed the endpoint and the duration callable.
- Make the empty-input guard None-safe: `not (text or "").strip()`.
- Drop the now-redundant int()/float() casts in _translate_with_loading.
- Type the numeric params as int|None / float|None to reflect Gradio's nulls.

Medium — swap-button stale RTL:
- _swap_languages now emits gr.update(rtl/text_align) for both textboxes so
direction follows the swapped text (rtl is sticky; a prior RTL flip must reset).

Low polish:
- Gate sampling on abs(temperature - 1.0) > 1e-6 to absorb float spinner drift.
- Reword the temperature caption to describe both directions.
- Document the Ctrl+Enter-vs-button RTL divergence in the submit comment.

Tests (68 fast, +10): param forwarding into generate, _normalize_params
clamp/None/NaN, translate None/NaN coercion, _estimate_duration None-safety,
empty/None parametrized guard, swap RTL, RTL_CODES ⊆ langmap, gr.Number bounds,
/translate input order pinned by label, public-path assertion keyed on
api_visibility, textbox caption content, RTL LTR-branch value. Docs synced to
78/68/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (4) hide show
  1. CLAUDE.md +4 -4
  2. README.md +2 -2
  3. app.py +68 -25
  4. tests/test_app.py +117 -9
CLAUDE.md CHANGED
@@ -25,8 +25,8 @@ uv run ruff format .
25
  uv run ty check
26
 
27
  # Test
28
- uv run pytest # all 68 tests (slow require CUDA + model download)
29
- uv run pytest -m "not slow" # 58 fast tests only
30
  uv run pytest -m slow # 10 model tests only (CUDA only)
31
 
32
  # Generate language mapping (dev only)
@@ -35,13 +35,13 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
35
 
36
  ## Architecture
37
 
38
- **`app.py`** — Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an `info=` caption discloses. Each control carries an `info=` caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` — fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation; whitespace-only input short-circuits to an empty string before the model loads. Decoding is greedy by default (deterministic); a non-default `temperature` enables sampling, and `num_beams > 1` uses beam search. A collapsed "Advanced" accordion exposes `max_new_tokens`/`num_beams`/`temperature` as `gr.Number` controls (no sliders; defaults mirror `translate()`, so the default surface stays greedy). Right-to-left target scripts (an explicit `RTL_CODES` token set — `region` is not a usable proxy) flip the output box to RTL via the Translate-button path. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens × num_beams` (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the `/translate` API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable `/translate` API endpoint (returns a bare string); the swap and Translate-button handlers are `api_visibility="private"`, and both generation handlers use `show_progress="minimal"`. Only `/translate` is public.
39
 
40
  **`langmap/`** — Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
41
 
42
  **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
43
 
44
- **`tests/`** — 68 tests (58 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 48 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract, greedy-by-default decoding, whitespace-input short-circuit, RTL output direction on the button path, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes, the Advanced accordion's `gr.Number` controls wired into both translate handlers, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
45
 
46
  ## Tooling
47
 
 
25
  uv run ty check
26
 
27
  # Test
28
+ uv run pytest # all 78 tests (slow require CUDA + model download)
29
+ uv run pytest -m "not slow" # 68 fast tests only
30
  uv run pytest -m slow # 10 model tests only (CUDA only)
31
 
32
  # Generate language mapping (dev only)
 
35
 
36
  ## Architecture
37
 
38
+ **`app.py`** — Single-file application with a Google Translate-style layout: top row has two symmetric, filterable, region-sorted language dropdowns (source defaults to "English (en)", target defaults to "French (fr)") with a swap button ("⇄") between them; below that, input textbox (autofocused) and output textbox with copy button side by side. The Translate button spans full width below both textboxes (shows "Translating..." during processing). Ctrl+Enter submits from the input. The model auto-detects source language; the source dropdown is for user reference and the swap button only, which an `info=` caption discloses. Each control carries an `info=` caption (caption text, not HTML/Markdown blocks): the target dropdown a quality-varies caveat, the input the Ctrl+Enter hint, the output model/arXiv/license provenance. Uses `@lru_cache` for lazy loading of the `google/madlad400-3b-mt` tokenizer and model. On ZeroGPU (`SPACES_ZERO_GPU=1`), `_maybe_eager_load()` places the model at module scope so the `spaces` hijack can pack weights and stream them into workers for fast cold starts; off-ZeroGPU (local, tests, cpu-basic) it stays lazy, so importing the app never downloads the model. Uses `bfloat16` on CUDA (T5/MADLAD is numerically unstable in `float16` — fp16's narrow range overflows to inf/NaN; bf16 is the format T5 was trained in), `float32` on CPU. MPS is not supported (produces garbage output with T5 models). Translation prepends a target language token with a space to the input text (e.g., `<2fr> Hello`) before tokenization and generation; whitespace-only or `None` input short-circuits to an empty string before the model loads. The generation params are normalized in `translate()` via `_normalize_params` (`None`/`NaN` → default, then clamped to range) so the cast-less public path and the ZeroGPU duration callable can't crash on a cleared `gr.Number` field. Decoding is greedy by default (deterministic); a non-default `temperature` (tolerance-compared to absorb float spinner drift) enables sampling, and `num_beams > 1` uses beam search. A collapsed "Advanced" accordion exposes `max_new_tokens`/`num_beams`/`temperature` as `gr.Number` controls (no sliders; defaults mirror `translate()`, so the default surface stays greedy). Right-to-left target scripts (an explicit `RTL_CODES` token set — `region` is not a usable proxy) flip the output box to RTL via the Translate-button and swap paths; Ctrl+Enter/`/translate` return a bare string and stay LTR. The `@spaces.GPU` decorator allocates GPU on HF Spaces infrastructure; its `duration` is a callable (`_estimate_duration`) that scales the GPU reservation with `max_new_tokens × num_beams` (capped at 120s). Both translate handlers (the private Translate-button click and the public submit) carry the advanced params, so Ctrl+Enter and the `/translate` API honor the accordion; the params keep defaults, so existing two-arg callers still work. The submit handler exposes a stable `/translate` API endpoint (returns a bare string); the swap and Translate-button handlers are `api_visibility="private"`, and both generation handlers use `show_progress="minimal"`. Only `/translate` is public.
39
 
40
  **`langmap/`** — Package with `langid_mapping.py`, mapping 418 language tokens to `{"name": ..., "region": ...}` dicts. Auto-generated by `scripts/generate_langmap.py` from Table 9 (Section A.1) of the MADLAD-400 paper. Available languages at runtime are the intersection of this mapping and the model's vocabulary.
41
 
42
  **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
43
 
44
+ **`tests/`** — 78 tests (68 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 58 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding, param forwarding into `generate`, `_normalize_params` None/NaN/clamp coercion, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
45
 
46
  ## Tooling
47
 
README.md CHANGED
@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
39
  uv run ruff check . # lint
40
  uv run ruff format . # format
41
  uv run ty check # type check
42
- uv run pytest -m "not slow" # 58 fast tests
43
- uv run pytest # all 68 tests (slow require CUDA + model download)
44
  ```
 
39
  uv run ruff check . # lint
40
  uv run ruff format . # format
41
  uv run ty check # type check
42
+ uv run pytest -m "not slow" # 68 fast tests
43
+ uv run pytest # all 78 tests (slow require CUDA + model download)
44
  ```
app.py CHANGED
@@ -3,6 +3,7 @@ Translation interface using the MADLAD-400 3B model.
3
  Translates between 418 languages from the MADLAD-400 paper.
4
  """
5
 
 
6
  import os
7
  import time
8
  import warnings
@@ -105,19 +106,40 @@ def _maybe_eager_load() -> None:
105
  _load_model()
106
 
107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  def _estimate_duration(
109
  text: str,
110
  target_language_name: str,
111
- max_new_tokens: int = 512,
112
- num_beams: int = 1,
113
- temperature: float = 1.0,
114
  ) -> int:
115
- """Reserve GPU time scaled to the worst case: generation cost grows with the
116
- number of tokens generated and the beam width. Mirrors translate()'s signature
117
- (ZeroGPU calls the duration callable with the decorated function's args).
118
- Conservative and capped at 120s; calibrate from the perf_counter log in
119
- translate() (zerogpu.md 'Sizing duration')."""
120
- del text, target_language_name, temperature # only token/beam counts drive runtime
 
 
121
  return min(120, 30 + (max_new_tokens * num_beams) // 8)
122
 
123
 
@@ -125,16 +147,24 @@ def _estimate_duration(
125
  def translate(
126
  text: str,
127
  target_language_name: str,
128
- max_new_tokens: int = 512,
129
- num_beams: int = 1,
130
- temperature: float = 1.0,
131
  ) -> str:
132
  # No-op on empty/whitespace input: skip the model entirely rather than feeding a bare
133
  # "<2xx> " prompt (which would burn generation time and emit a stray token). Guard lives
134
  # here, not in _translate_with_loading, so the public /translate and Ctrl+Enter paths
135
- # (which call translate() directly) are covered too. Returns a str, so the contract holds.
136
- if not text.strip():
 
137
  return ""
 
 
 
 
 
 
 
138
  tokenizer = _load_tokenizer()
139
  model = _load_model()
140
  device = model.device
@@ -144,7 +174,7 @@ def translate(
144
  if target_code is None:
145
  raise ValueError(f"Unsupported language: {target_language_name}")
146
 
147
- if num_beams > 1 and temperature != 1.0:
148
  gr.Info("Temperature has no effect when beam search is enabled (num_beams > 1).")
149
 
150
  input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
@@ -152,7 +182,7 @@ def translate(
152
  generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
153
  # Greedy by default (deterministic, higher-quality MT). Only sample when the user
154
  # explicitly sets a non-default temperature; beam search (num_beams > 1) ignores it.
155
- if num_beams == 1 and temperature != 1.0:
156
  generate_kwargs["do_sample"] = True
157
  generate_kwargs["temperature"] = temperature
158
 
@@ -171,12 +201,13 @@ def translate(
171
  def _translate_with_loading(
172
  text: str,
173
  target_language_name: str,
174
- max_new_tokens: int = 512,
175
- num_beams: int = 1,
176
- temperature: float = 1.0,
177
  ) -> Generator[tuple[object, object], None, None]:
178
  yield gr.update(value="Translating...", interactive=False), gr.update()
179
- result = translate(text, target_language_name, int(max_new_tokens), int(num_beams), float(temperature))
 
180
  # Flip the output box to RTL for right-to-left target scripts so the translation reads
181
  # correctly; reset to LTR otherwise (rtl is sticky across reruns). Only the button path
182
  # carries this — the public /translate endpoint stays a bare str to keep its API stable.
@@ -188,9 +219,20 @@ def _translate_with_loading(
188
 
189
  def _swap_languages(
190
  source_lang: str, target_lang: str, source_text: str, target_text: str
191
- ) -> tuple[str, str, str, str]:
192
- """Swap source/target languages and their text."""
193
- return target_lang, source_lang, target_text, source_text
 
 
 
 
 
 
 
 
 
 
 
194
 
195
 
196
  def _build_demo() -> gr.Blocks:
@@ -262,7 +304,7 @@ def _build_demo() -> gr.Blocks:
262
  minimum=0.1,
263
  maximum=2.0,
264
  step=0.1,
265
- info="No effect at 1.0 (greedy) or when Beams > 1; lower samples less randomly.",
266
  )
267
 
268
  # UI-only handlers: kept off the public API surface (private) so only /translate is exposed.
@@ -284,7 +326,8 @@ def _build_demo() -> gr.Blocks:
284
  # /translate exposes the advanced params too. They all have defaults, so existing
285
  # two-arg callers (text, target) keep working; wiring them here also makes Ctrl+Enter
286
  # honor the Advanced accordion, matching the Translate button. The endpoint returns a
287
- # bare str (RTL direction is a UI-only concern handled on the button path).
 
288
  input_text.submit(
289
  fn=translate,
290
  inputs=[input_text, target_language, max_new_tokens, num_beams, temperature],
 
3
  Translates between 418 languages from the MADLAD-400 paper.
4
  """
5
 
6
+ import math
7
  import os
8
  import time
9
  import warnings
 
106
  _load_model()
107
 
108
 
109
+ def _normalize_params(
110
+ max_new_tokens: float | None, num_beams: float | None, temperature: float | None
111
+ ) -> tuple[int, int, float]:
112
+ """Coerce the advanced generation params to safe values. A cleared ``gr.Number`` arrives as
113
+ ``None`` (Gradio skips its bounds check for ``None``) and the public ``/translate`` path
114
+ passes values uncast, so this is the single funnel every caller — button, submit, API,
115
+ direct, and the ZeroGPU duration callable — goes through. ``None``/``NaN`` fall back to the
116
+ defaults; values are clamped to the ranges the Advanced ``gr.Number`` controls advertise."""
117
+
118
+ def _num(value: float | None, default: float) -> float:
119
+ return default if value is None or math.isnan(value) else value
120
+
121
+ return (
122
+ int(max(1, min(1024, _num(max_new_tokens, 512)))),
123
+ int(max(1, min(8, _num(num_beams, 1)))),
124
+ float(max(0.1, min(2.0, _num(temperature, 1.0)))),
125
+ )
126
+
127
+
128
  def _estimate_duration(
129
  text: str,
130
  target_language_name: str,
131
+ max_new_tokens: int | None = 512,
132
+ num_beams: int | None = 1,
133
+ temperature: float | None = 1.0,
134
  ) -> int:
135
+ """Reserve GPU time scaled to the worst case: generation cost grows with the number of
136
+ tokens generated and the beam width. Mirrors translate()'s signature (ZeroGPU calls the
137
+ duration callable with the decorated function's args, and runs it *before* translate(), so
138
+ it must tolerate the same cleared-field ``None`` values normalize first). Conservative and
139
+ capped at 120s; calibrate from the perf_counter log in translate() (zerogpu.md 'Sizing
140
+ duration')."""
141
+ del text, target_language_name # only token/beam counts drive runtime
142
+ max_new_tokens, num_beams, _ = _normalize_params(max_new_tokens, num_beams, temperature)
143
  return min(120, 30 + (max_new_tokens * num_beams) // 8)
144
 
145
 
 
147
  def translate(
148
  text: str,
149
  target_language_name: str,
150
+ max_new_tokens: int | None = 512,
151
+ num_beams: int | None = 1,
152
+ temperature: float | None = 1.0,
153
  ) -> str:
154
  # No-op on empty/whitespace input: skip the model entirely rather than feeding a bare
155
  # "<2xx> " prompt (which would burn generation time and emit a stray token). Guard lives
156
  # here, not in _translate_with_loading, so the public /translate and Ctrl+Enter paths
157
+ # (which call translate() directly) are covered too. (text or "") stays None-safe for an
158
+ # API caller that POSTs a null text field. Returns a str, so the contract holds.
159
+ if not (text or "").strip():
160
  return ""
161
+ # Normalize the generation params here — translate() is the single source of truth. The
162
+ # public submit path passes gr.Number values uncast, and a cleared field arrives as
163
+ # None/NaN, so coerce and clamp before use (the duration callable normalizes identically).
164
+ max_new_tokens, num_beams, temperature = _normalize_params(max_new_tokens, num_beams, temperature)
165
+ # Compare with a tolerance so float spinner drift (e.g. 0.1*9 = 0.999…) doesn't trip sampling.
166
+ sampling = abs(temperature - 1.0) > 1e-6
167
+
168
  tokenizer = _load_tokenizer()
169
  model = _load_model()
170
  device = model.device
 
174
  if target_code is None:
175
  raise ValueError(f"Unsupported language: {target_language_name}")
176
 
177
+ if num_beams > 1 and sampling:
178
  gr.Info("Temperature has no effect when beam search is enabled (num_beams > 1).")
179
 
180
  input_ids = tokenizer(target_code + " " + text, return_tensors="pt").input_ids.to(device)
 
182
  generate_kwargs: dict = {"input_ids": input_ids, "max_new_tokens": max_new_tokens, "num_beams": num_beams}
183
  # Greedy by default (deterministic, higher-quality MT). Only sample when the user
184
  # explicitly sets a non-default temperature; beam search (num_beams > 1) ignores it.
185
+ if num_beams == 1 and sampling:
186
  generate_kwargs["do_sample"] = True
187
  generate_kwargs["temperature"] = temperature
188
 
 
201
  def _translate_with_loading(
202
  text: str,
203
  target_language_name: str,
204
+ max_new_tokens: int | None = 512,
205
+ num_beams: int | None = 1,
206
+ temperature: float | None = 1.0,
207
  ) -> Generator[tuple[object, object], None, None]:
208
  yield gr.update(value="Translating...", interactive=False), gr.update()
209
+ # translate() normalizes the params (None/NaN/clamp), so forward them as-is.
210
+ result = translate(text, target_language_name, max_new_tokens, num_beams, temperature)
211
  # Flip the output box to RTL for right-to-left target scripts so the translation reads
212
  # correctly; reset to LTR otherwise (rtl is sticky across reruns). Only the button path
213
  # carries this — the public /translate endpoint stays a bare str to keep its API stable.
 
219
 
220
  def _swap_languages(
221
  source_lang: str, target_lang: str, source_text: str, target_text: str
222
+ ) -> tuple[str, str, object, object]:
223
+ """Swap source/target languages and their text, flipping each textbox's direction to follow
224
+ the text that lands in it. rtl is sticky across reruns, so a stale RTL flip left by a prior
225
+ translation must be reset. After the swap the input box holds the old target text and the
226
+ output box holds the old source text."""
227
+ name_to_code, _ = _build_language_mappings()
228
+ input_rtl = name_to_code.get(target_lang) in RTL_CODES # old target text moves into the input box
229
+ output_rtl = name_to_code.get(source_lang) in RTL_CODES # old source text moves into the output box
230
+ return (
231
+ target_lang,
232
+ source_lang,
233
+ gr.update(value=target_text, rtl=input_rtl, text_align="right" if input_rtl else "left"),
234
+ gr.update(value=source_text, rtl=output_rtl, text_align="right" if output_rtl else "left"),
235
+ )
236
 
237
 
238
  def _build_demo() -> gr.Blocks:
 
304
  minimum=0.1,
305
  maximum=2.0,
306
  step=0.1,
307
+ info="No effect at 1.0 (greedy) or when Beams > 1; below 1.0 is more focused, above 1.0 more random.",
308
  )
309
 
310
  # UI-only handlers: kept off the public API surface (private) so only /translate is exposed.
 
326
  # /translate exposes the advanced params too. They all have defaults, so existing
327
  # two-arg callers (text, target) keep working; wiring them here also makes Ctrl+Enter
328
  # honor the Advanced accordion, matching the Translate button. The endpoint returns a
329
+ # bare str, so an RTL target submitted via Ctrl+Enter is NOT direction-flipped — that
330
+ # happens only on the Translate-button path (an accepted, documented UI divergence).
331
  input_text.submit(
332
  fn=translate,
333
  inputs=[input_text, target_language, max_new_tokens, num_beams, temperature],
tests/test_app.py CHANGED
@@ -173,20 +173,99 @@ def test_translate_greedy_by_default_samples_on_custom_temperature():
173
  assert sampled.get("do_sample") is True and sampled["temperature"] == 0.5
174
 
175
 
176
- def test_translate_skips_model_on_empty_input():
177
- """Whitespace-only input should short-circuit to '' without loading or running the model."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
178
  import app
179
 
180
  with (
181
  patch("app._load_model") as load_model,
182
  patch("app._load_tokenizer") as load_tokenizer,
183
  ):
184
- result = app.translate(" ", "French (fr)")
185
  assert result == ""
186
  load_model.assert_not_called()
187
  load_tokenizer.assert_not_called()
188
 
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  def test_translate_with_loading_flips_rtl_for_rtl_target():
191
  """The private button path marks the output RTL for right-to-left target languages and
192
  resets to LTR otherwise (rtl is sticky across reruns)."""
@@ -204,7 +283,17 @@ def test_translate_with_loading_flips_rtl_for_rtl_target():
204
  ltr = final_output("French (fr)", "<2fr>")
205
  assert rtl["rtl"] is True and rtl["text_align"] == "right"
206
  assert ltr["rtl"] is False and ltr["text_align"] == "left"
207
- assert rtl["value"] == "out"
 
 
 
 
 
 
 
 
 
 
208
 
209
 
210
  def test_requirements_excludes_platform_packages():
@@ -292,9 +381,13 @@ def test_dropdowns_have_info_captions(demo):
292
 
293
 
294
  def test_textboxes_have_info_captions(demo):
295
- """Input and output textboxes carry info= captions (Ctrl+Enter hint and model provenance)."""
296
  textboxes = [b for b in demo.blocks.values() if type(b).__name__ == "Textbox"]
297
  assert all(t.info for t in textboxes), "input and output textboxes should carry info captions"
 
 
 
 
298
 
299
 
300
  def test_demo_has_two_textboxes(demo):
@@ -416,18 +509,30 @@ def test_advanced_params_are_numbers(demo):
416
  assert len(numbers) == 3, f"Expected 3 Number controls, found {len(numbers)}"
417
 
418
 
 
 
 
 
 
 
 
 
 
 
 
419
  def test_advanced_params_wired_to_both_translate_handlers(demo):
420
  """Both translate handlers (button click + public submit) carry the three advanced Number
421
- params after text + language, so Ctrl+Enter and the /translate API honor the accordion too."""
 
422
  full_input_fns = [
423
  fn
424
  for fn in demo.fns.values()
425
  if [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
426
  ]
427
  assert len(full_input_fns) == 2, "Expected both translate handlers to carry the 3 advanced params"
428
- assert any(getattr(fn, "api_name", None) == "translate" for fn in full_input_fns), (
429
- "the public /translate endpoint must carry the advanced params"
430
- )
431
 
432
 
433
  def test_all_handlers_wired(demo):
@@ -450,6 +555,9 @@ def test_translate_endpoint_has_stable_api_name(demo):
450
  assert len(api_fns) == 1, "Expected exactly one handler with api_name='translate'"
451
  fn = api_fns[0]
452
  assert [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
 
 
 
453
  assert [type(o).__name__ for o in fn.outputs] == ["Textbox"]
454
 
455
 
 
173
  assert sampled.get("do_sample") is True and sampled["temperature"] == 0.5
174
 
175
 
176
+ def _run_translate(text, target, **kwargs):
177
+ """Call translate() against a mocked model/tokenizer and return generate()'s kwargs + result."""
178
+ import app
179
+
180
+ model = MagicMock()
181
+ model.device = torch.device("cpu")
182
+ model.generate.return_value = [[0]]
183
+ tokenizer = MagicMock()
184
+ tokenizer.decode.return_value = "out"
185
+ with (
186
+ patch("app._load_model", return_value=model),
187
+ patch("app._load_tokenizer", return_value=tokenizer),
188
+ patch("app._build_language_mappings", return_value=({"French (fr)": "<2fr>"}, ["French (fr)"])),
189
+ ):
190
+ result = app.translate(text, target, **kwargs)
191
+ return model.generate.call_args.kwargs if model.generate.called else None, result
192
+
193
+
194
+ def test_translate_forwards_generation_params():
195
+ """Non-default max_new_tokens/num_beams must reach model.generate; beam search must not sample."""
196
+ kwargs, _ = _run_translate("Hello", "French (fr)", max_new_tokens=10, num_beams=4)
197
+ assert kwargs["max_new_tokens"] == 10
198
+ assert kwargs["num_beams"] == 4
199
+ assert "do_sample" not in kwargs, "beam search must not enable sampling"
200
+
201
+
202
+ def test_normalize_params_clamps_and_defaults():
203
+ """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
204
+ import app
205
+
206
+ assert app._normalize_params(None, None, None) == (512, 1, 1.0)
207
+ nan = float("nan")
208
+ assert app._normalize_params(nan, nan, nan) == (512, 1, 1.0)
209
+ assert app._normalize_params(99999, 99, 9.0) == (1024, 8, 2.0) # clamp high
210
+ assert app._normalize_params(0, 0, 0.0) == (1, 1, 0.1) # clamp low
211
+ mnt, beams, temp = app._normalize_params(10.0, 4.0, 0.5)
212
+ assert (mnt, beams, temp) == (10, 4, 0.5)
213
+ assert type(mnt) is int and type(beams) is int and type(temp) is float
214
+
215
+
216
+ def test_translate_normalizes_invalid_params():
217
+ """A cleared gr.Number arrives as None (and temperature can be NaN) on the public path;
218
+ translate() must coerce to defaults instead of crashing or corrupting sampling."""
219
+ kwargs, result = _run_translate(
220
+ "Hello", "French (fr)", max_new_tokens=None, num_beams=None, temperature=float("nan")
221
+ )
222
+ assert result == "out"
223
+ assert kwargs["max_new_tokens"] == 512 and kwargs["num_beams"] == 1
224
+ assert "do_sample" not in kwargs, "NaN temperature must fall back to greedy"
225
+
226
+
227
+ def test_estimate_duration_handles_none_params():
228
+ """The ZeroGPU duration callable runs before translate() with the same uncast args, so a
229
+ cleared gr.Number (None) must not crash it."""
230
+ import app
231
+
232
+ assert isinstance(app._estimate_duration("hi", "French (fr)", None, None, None), int)
233
+
234
+
235
+ @pytest.mark.parametrize("blank", ["", " ", "\n\t", None])
236
+ def test_translate_skips_model_on_empty_input(blank):
237
+ """Empty/whitespace/None input short-circuits to '' without loading or running the model."""
238
  import app
239
 
240
  with (
241
  patch("app._load_model") as load_model,
242
  patch("app._load_tokenizer") as load_tokenizer,
243
  ):
244
+ result = app.translate(blank, "French (fr)")
245
  assert result == ""
246
  load_model.assert_not_called()
247
  load_tokenizer.assert_not_called()
248
 
249
 
250
+ def test_swap_flips_rtl_to_follow_text():
251
+ """Swapping must move each textbox's direction with the text: after EN->Arabic then swap,
252
+ the input box (now holding the Arabic translation) goes RTL and the output box (now holding
253
+ the English source) resets to LTR."""
254
+ import gradio as gr
255
+
256
+ import app
257
+
258
+ name_to_code = {"English (en)": "<2en>", "Arabic (ar)": "<2ar>"}
259
+ with patch("app._build_language_mappings", return_value=(name_to_code, list(name_to_code))):
260
+ new_source, new_target, input_update, output_update = app._swap_languages(
261
+ "English (en)", "Arabic (ar)", "Hello", "RTL-text"
262
+ )
263
+ assert (new_source, new_target) == ("Arabic (ar)", "English (en)")
264
+ # input box now holds the Arabic translation -> RTL; output box holds the English source -> LTR
265
+ assert input_update == gr.update(value="RTL-text", rtl=True, text_align="right")
266
+ assert output_update == gr.update(value="Hello", rtl=False, text_align="left")
267
+
268
+
269
  def test_translate_with_loading_flips_rtl_for_rtl_target():
270
  """The private button path marks the output RTL for right-to-left target languages and
271
  resets to LTR otherwise (rtl is sticky across reruns)."""
 
283
  ltr = final_output("French (fr)", "<2fr>")
284
  assert rtl["rtl"] is True and rtl["text_align"] == "right"
285
  assert ltr["rtl"] is False and ltr["text_align"] == "left"
286
+ assert rtl["value"] == "out" and ltr["value"] == "out" # both branches forward the result
287
+
288
+
289
+ def test_rtl_codes_are_valid_langmap_tokens():
290
+ """Every RTL_CODES token must exist in the langmap, so a langmap regeneration that renames
291
+ or drops a token can't silently disable an RTL flip without failing this test."""
292
+ import app
293
+ from langmap.langid_mapping import langid_to_language
294
+
295
+ missing = app.RTL_CODES - set(langid_to_language)
296
+ assert not missing, f"RTL_CODES not in langmap: {missing}"
297
 
298
 
299
  def test_requirements_excludes_platform_packages():
 
381
 
382
 
383
  def test_textboxes_have_info_captions(demo):
384
+ """Input box carries the Ctrl+Enter hint; output box carries model/arXiv/license provenance."""
385
  textboxes = [b for b in demo.blocks.values() if type(b).__name__ == "Textbox"]
386
  assert all(t.info for t in textboxes), "input and output textboxes should carry info captions"
387
+ input_box = next(t for t in textboxes if t.interactive is not False)
388
+ output_box = next(t for t in textboxes if t.interactive is False)
389
+ assert "ctrl+enter" in input_box.info.lower()
390
+ assert "madlad400-3b-mt" in output_box.info
391
 
392
 
393
  def test_demo_has_two_textboxes(demo):
 
509
  assert len(numbers) == 3, f"Expected 3 Number controls, found {len(numbers)}"
510
 
511
 
512
+ def test_advanced_params_have_safe_bounds(demo):
513
+ """The Number controls must keep their documented bounds — for the public /translate path,
514
+ Gradio's component preprocess is the server-side guard keeping params in range."""
515
+ numbers = {n.label: n for n in demo.blocks.values() if type(n).__name__ == "Number"}
516
+ assert numbers["Max new tokens"].minimum == 1 and numbers["Max new tokens"].maximum == 1024
517
+ assert numbers["Max new tokens"].precision == 0
518
+ assert numbers["Beams"].minimum == 1 and numbers["Beams"].maximum == 8
519
+ assert numbers["Beams"].precision == 0
520
+ assert numbers["Temperature"].minimum == 0.1 and numbers["Temperature"].maximum == 2.0
521
+
522
+
523
  def test_advanced_params_wired_to_both_translate_handlers(demo):
524
  """Both translate handlers (button click + public submit) carry the three advanced Number
525
+ params after text + language. Exactly one of them is the public /translate endpoint, so the
526
+ params demonstrably reach the public path (keyed on api_visibility, not a name coincidence)."""
527
  full_input_fns = [
528
  fn
529
  for fn in demo.fns.values()
530
  if [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
531
  ]
532
  assert len(full_input_fns) == 2, "Expected both translate handlers to carry the 3 advanced params"
533
+ public = [fn for fn in full_input_fns if getattr(fn, "api_visibility", None) == "public"]
534
+ assert len(public) == 1, "exactly one full-input handler should be the public endpoint"
535
+ assert getattr(public[0], "api_name", None) == "translate", "the public one must be /translate"
536
 
537
 
538
  def test_all_handlers_wired(demo):
 
555
  assert len(api_fns) == 1, "Expected exactly one handler with api_name='translate'"
556
  fn = api_fns[0]
557
  assert [type(i).__name__ for i in fn.inputs] == ["Textbox", "Dropdown", "Number", "Number", "Number"]
558
+ # the three Number inputs are positionally indistinguishable by type, so pin their order by
559
+ # label — a num_beams/temperature swap in the inputs= list would otherwise pass silently.
560
+ assert [i.label for i in fn.inputs[2:]] == ["Max new tokens", "Beams", "Temperature"]
561
  assert [type(o).__name__ for o in fn.outputs] == ["Textbox"]
562
 
563