Daryl Lim Claude Opus 4.8 (1M context) commited on
Commit
fe1383b
·
1 Parent(s): e4538dd

test: cover temperature tolerance gate and token×beam cap through translate

Browse files

Two behavioral changes from this session lacked direct tests:
- test_translate_tolerates_near_default_temperature: a temperature within ~1e-6
of 1.0 (float spinner drift) stays greedy, while a clearly different value
still samples.
- test_translate_applies_token_beam_cap: a high token×beam request reaches
model.generate with the capped token count (end-to-end through translate(),
not just _normalize_params in isolation).

71 fast pass. Docs synced to 81/71/10.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Files changed (3) hide show
  1. CLAUDE.md +3 -3
  2. README.md +2 -2
  3. tests/test_app.py +19 -0
CLAUDE.md CHANGED
@@ -25,8 +25,8 @@ uv run ruff format .
25
  uv run ty check
26
 
27
  # Test
28
- uv run pytest # all 79 tests (slow require CUDA + model download)
29
- uv run pytest -m "not slow" # 69 fast tests only
30
  uv run pytest -m slow # 10 model tests only (CUDA only)
31
 
32
  # Generate language mapping (dev only)
@@ -41,7 +41,7 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
41
 
42
  **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
43
 
44
- **`tests/`** — 79 tests (69 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 59 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding, param forwarding into `generate`, `_normalize_params` None/NaN/clamp coercion and token×beam product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
45
 
46
  ## Tooling
47
 
 
25
  uv run ty check
26
 
27
  # Test
28
+ uv run pytest # all 81 tests (slow require CUDA + model download)
29
+ uv run pytest -m "not slow" # 71 fast tests only
30
  uv run pytest -m slow # 10 model tests only (CUDA only)
31
 
32
  # Generate language mapping (dev only)
 
41
 
42
  **`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
43
 
44
+ **`tests/`** — 81 tests (71 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 61 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding with near-1.0 temperature tolerance, param forwarding into `generate` and the token×beam cap applied through `translate()`, `_normalize_params` None/NaN/clamp coercion and product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
45
 
46
  ## Tooling
47
 
README.md CHANGED
@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
39
  uv run ruff check . # lint
40
  uv run ruff format . # format
41
  uv run ty check # type check
42
- uv run pytest -m "not slow" # 69 fast tests
43
- uv run pytest # all 79 tests (slow require CUDA + model download)
44
  ```
 
39
  uv run ruff check . # lint
40
  uv run ruff format . # format
41
  uv run ty check # type check
42
+ uv run pytest -m "not slow" # 71 fast tests
43
+ uv run pytest # all 81 tests (slow require CUDA + model download)
44
  ```
tests/test_app.py CHANGED
@@ -199,6 +199,25 @@ def test_translate_forwards_generation_params():
199
  assert "do_sample" not in kwargs, "beam search must not enable sampling"
200
 
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  def test_normalize_params_clamps_and_defaults():
203
  """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
204
  import app
 
199
  assert "do_sample" not in kwargs, "beam search must not enable sampling"
200
 
201
 
202
+ def test_translate_applies_token_beam_cap():
203
+ """A high token×beam request must reach model.generate with the capped token count (not the
204
+ raw value) so generation stays within its GPU reservation."""
205
+ import app
206
+
207
+ kwargs, _ = _run_translate("Hello", "French (fr)", max_new_tokens=1024, num_beams=8)
208
+ assert kwargs["num_beams"] == 8
209
+ assert kwargs["max_new_tokens"] * kwargs["num_beams"] <= app._MAX_TOKEN_BEAM_PRODUCT
210
+
211
+
212
+ def test_translate_tolerates_near_default_temperature():
213
+ """A temperature within ~1e-6 of 1.0 (float spinner drift) stays greedy; a clearly different
214
+ value still samples."""
215
+ near_one, _ = _run_translate("Hello", "French (fr)", temperature=1.0 - 1e-9)
216
+ sampled, _ = _run_translate("Hello", "French (fr)", temperature=0.7)
217
+ assert "do_sample" not in near_one, "near-1.0 temperature should stay greedy"
218
+ assert sampled.get("do_sample") is True
219
+
220
+
221
  def test_normalize_params_clamps_and_defaults():
222
  """_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
223
  import app