Spaces:
Running
Running
| # Stage 1 — Query Rewriting Contract | |
| ## Purpose | |
| Stage 1 (“Query Rewriting”) converts a free-form natural-language prompt into a | |
| comma-separated list of short, tag-shaped phrases suitable for downstream | |
| retrieval over a closed image-tag vocabulary. | |
| This stage is not tagging, not normalization, and not validation. | |
| Its sole role is to rewrite user intent into a retrieval-friendly surface form | |
| with high recall. | |
| --- | |
| ## Inputs | |
| - User prompt: an arbitrary string entered by the user. | |
| - The input may include: | |
| - natural language | |
| - comma-separated phrases | |
| - Stable-Diffusion-style parentheses and weights | |
| - punctuation and spacing artifacts | |
| No structural guarantees are assumed about the input. | |
| --- | |
| ## Pre-Rewrite Heuristics (Non-LLM) | |
| Before the LLM rewrite is invoked, the system performs a lightweight heuristic | |
| extraction: | |
| - The prompt is split on "." and "," | |
| - Segments with three or fewer whitespace-separated tokens are retained | |
| - Case-insensitive deduplication is applied | |
| This produces a small list of user-provided phrases that may later be appended | |
| to the rewrite output for retrieval support. | |
| This heuristic: | |
| - is lossy | |
| - is not authoritative | |
| - exists only to preserve short explicit phrases if the rewrite fails or omits them | |
| --- | |
| ## Rewrite Mechanism | |
| Stage 1 uses a single deterministic LLM call with: | |
| - temperature = 0.0 | |
| - no retries | |
| - no streaming | |
| - no structured output enforcement | |
| The system prompt instructs the model to: | |
| - output a comma-separated list | |
| - use short, literal, tag-shaped phrases | |
| - preserve coherent multi-word visual concepts | |
| - avoid inventing details | |
| - avoid demographic inference | |
| - avoid guessing identities | |
| The LLM output is treated as plain text. | |
| --- | |
| ## Output Format | |
| On success, Stage 1 returns: | |
| - a single string | |
| - containing comma-separated phrases | |
| - with arbitrary spacing normalized | |
| - truncated to a maximum of approximately 800 characters | |
| No further parsing, validation, or canonicalization is applied at this stage. | |
| The rewrite may: | |
| - reorder concepts | |
| - merge or split phrasing | |
| - introduce additional generic visual concepts (e.g. "white background") | |
| --- | |
| ## Failure and Fallback Behavior | |
| If the LLM call: | |
| - errors | |
| - produces a refusal-like response | |
| - returns empty output | |
| then Stage 1 returns an empty string. | |
| In downstream stages, this empty rewrite may be supplemented by the heuristic | |
| phrases extracted earlier, but Stage 1 itself does not attempt recovery. | |
| --- | |
| ## Explicit Non-Guarantees | |
| Stage 1 does not guarantee that: | |
| - output phrases correspond to known vocabulary tags | |
| - phrases are unique | |
| - phrases are canonicalized | |
| - phrases are mutually exclusive | |
| - all user concepts are preserved | |
| - added concepts reflect ground truth | |
| Stage 2 must not assume any of the above. | |
| --- | |
| ## Contract Boundary with Stage 2 | |
| Stage 1 guarantees only that: | |
| - output is a comma-separated list of short phrases | |
| - phrases are intended to be retrieval queries, not canonical tags | |
| - output is deterministic for a given input | |
| Stage 2 is responsible for: | |
| - normalization | |
| - deduplication | |
| - head-noun expansion | |
| - vocabulary grounding | |
| - alias handling | |
| - scoring and ranking | |
| --- | |
| ## Summary (Interview-Safe) | |
| Stage 1 is a deterministic query-rewriting step that reshapes free-form text into | |
| retrieval-friendly phrase queries. It intentionally favors recall and | |
| surface-form alignment over correctness or canonicalization, delegating all | |
| grounding and validation to later stages. | |