Spaces:
Running on Zero
Running on Zero
Daryl Lim Claude Opus 4.8 (1M context) commited on
Commit ·
fe1383b
1
Parent(s): e4538dd
test: cover temperature tolerance gate and token×beam cap through translate
Browse filesTwo behavioral changes from this session lacked direct tests:
- test_translate_tolerates_near_default_temperature: a temperature within ~1e-6
of 1.0 (float spinner drift) stays greedy, while a clearly different value
still samples.
- test_translate_applies_token_beam_cap: a high token×beam request reaches
model.generate with the capped token count (end-to-end through translate(),
not just _normalize_params in isolation).
71 fast pass. Docs synced to 81/71/10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- CLAUDE.md +3 -3
- README.md +2 -2
- tests/test_app.py +19 -0
CLAUDE.md
CHANGED
|
@@ -25,8 +25,8 @@ uv run ruff format .
|
|
| 25 |
uv run ty check
|
| 26 |
|
| 27 |
# Test
|
| 28 |
-
uv run pytest # all
|
| 29 |
-
uv run pytest -m "not slow" #
|
| 30 |
uv run pytest -m slow # 10 model tests only (CUDA only)
|
| 31 |
|
| 32 |
# Generate language mapping (dev only)
|
|
@@ -41,7 +41,7 @@ uv run scripts/generate_langmap.py <path-to-paper.pdf>
|
|
| 41 |
|
| 42 |
**`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
|
| 43 |
|
| 44 |
-
**`tests/`** —
|
| 45 |
|
| 46 |
## Tooling
|
| 47 |
|
|
|
|
| 25 |
uv run ty check
|
| 26 |
|
| 27 |
# Test
|
| 28 |
+
uv run pytest # all 81 tests (slow require CUDA + model download)
|
| 29 |
+
uv run pytest -m "not slow" # 71 fast tests only
|
| 30 |
uv run pytest -m slow # 10 model tests only (CUDA only)
|
| 31 |
|
| 32 |
# Generate language mapping (dev only)
|
|
|
|
| 41 |
|
| 42 |
**`scripts/`** — `generate_langmap.py` parses the MADLAD-400 paper PDF (Table 9, pages 16-22) using pdfplumber and generates the static language mapping with region assignments. Dev-only tool; requires `requirements-dev.txt` dependencies.
|
| 43 |
|
| 44 |
+
**`tests/`** — 81 tests (71 fast, 10 slow). `test_langmap.py` has 10 fast tests for mapping validation (dict shape, regions, spot-checks). `test_app.py` has 61 fast tests (signatures, device fallback, bfloat16/float32 dtype selection, ZeroGPU eager-load gating, GPU duration estimator and its signature-mirror contract + `None`-safety, greedy-by-default decoding with near-1.0 temperature tolerance, param forwarding into `generate` and the token×beam cap applied through `translate()`, `_normalize_params` None/NaN/clamp coercion and product cap, empty/`None`-input short-circuit, RTL output direction on the button and swap paths, `RTL_CODES` ⊆ langmap invariant, `requirements.txt` excludes platform packages, UI layout with symmetric dropdowns, swap button, textbox config including toolbar buttons and input autofocus, `info=` captions on dropdowns and textboxes spot-checked by content, the Advanced accordion's `gr.Number` controls and their bounds, advanced params reaching the public endpoint by `api_visibility` with the `/translate` input order pinned by label, `show_progress="minimal"` on generation handlers, handler wiring, stable `translate` API endpoint carrying the advanced params with UI-only handlers kept private, no HTML elements, no sliders, locale codes, no title) and 10 slow tests (translation with various parameters, language mapping). Slow tests require CUDA and model download; auto-skipped without CUDA.
|
| 45 |
|
| 46 |
## Tooling
|
| 47 |
|
README.md
CHANGED
|
@@ -39,6 +39,6 @@ The Gradio interface launches at `http://localhost:7860`.
|
|
| 39 |
uv run ruff check . # lint
|
| 40 |
uv run ruff format . # format
|
| 41 |
uv run ty check # type check
|
| 42 |
-
uv run pytest -m "not slow" #
|
| 43 |
-
uv run pytest # all
|
| 44 |
```
|
|
|
|
| 39 |
uv run ruff check . # lint
|
| 40 |
uv run ruff format . # format
|
| 41 |
uv run ty check # type check
|
| 42 |
+
uv run pytest -m "not slow" # 71 fast tests
|
| 43 |
+
uv run pytest # all 81 tests (slow require CUDA + model download)
|
| 44 |
```
|
tests/test_app.py
CHANGED
|
@@ -199,6 +199,25 @@ def test_translate_forwards_generation_params():
|
|
| 199 |
assert "do_sample" not in kwargs, "beam search must not enable sampling"
|
| 200 |
|
| 201 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 202 |
def test_normalize_params_clamps_and_defaults():
|
| 203 |
"""_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
|
| 204 |
import app
|
|
|
|
| 199 |
assert "do_sample" not in kwargs, "beam search must not enable sampling"
|
| 200 |
|
| 201 |
|
| 202 |
+
def test_translate_applies_token_beam_cap():
|
| 203 |
+
"""A high token×beam request must reach model.generate with the capped token count (not the
|
| 204 |
+
raw value) so generation stays within its GPU reservation."""
|
| 205 |
+
import app
|
| 206 |
+
|
| 207 |
+
kwargs, _ = _run_translate("Hello", "French (fr)", max_new_tokens=1024, num_beams=8)
|
| 208 |
+
assert kwargs["num_beams"] == 8
|
| 209 |
+
assert kwargs["max_new_tokens"] * kwargs["num_beams"] <= app._MAX_TOKEN_BEAM_PRODUCT
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
def test_translate_tolerates_near_default_temperature():
|
| 213 |
+
"""A temperature within ~1e-6 of 1.0 (float spinner drift) stays greedy; a clearly different
|
| 214 |
+
value still samples."""
|
| 215 |
+
near_one, _ = _run_translate("Hello", "French (fr)", temperature=1.0 - 1e-9)
|
| 216 |
+
sampled, _ = _run_translate("Hello", "French (fr)", temperature=0.7)
|
| 217 |
+
assert "do_sample" not in near_one, "near-1.0 temperature should stay greedy"
|
| 218 |
+
assert sampled.get("do_sample") is True
|
| 219 |
+
|
| 220 |
+
|
| 221 |
def test_normalize_params_clamps_and_defaults():
|
| 222 |
"""_normalize_params coerces None/NaN to defaults and clamps to the advertised ranges."""
|
| 223 |
import app
|