Spaces:
Running
Running
File size: 3,590 Bytes
c6be992 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | # Stage 1 — Query Rewriting Contract
## Purpose
Stage 1 (“Query Rewriting”) converts a free-form natural-language prompt into a
comma-separated list of short, tag-shaped phrases suitable for downstream
retrieval over a closed image-tag vocabulary.
This stage is not tagging, not normalization, and not validation.
Its sole role is to rewrite user intent into a retrieval-friendly surface form
with high recall.
---
## Inputs
- User prompt: an arbitrary string entered by the user.
- The input may include:
- natural language
- comma-separated phrases
- Stable-Diffusion-style parentheses and weights
- punctuation and spacing artifacts
No structural guarantees are assumed about the input.
---
## Pre-Rewrite Heuristics (Non-LLM)
Before the LLM rewrite is invoked, the system performs a lightweight heuristic
extraction:
- The prompt is split on "." and ","
- Segments with three or fewer whitespace-separated tokens are retained
- Case-insensitive deduplication is applied
This produces a small list of user-provided phrases that may later be appended
to the rewrite output for retrieval support.
This heuristic:
- is lossy
- is not authoritative
- exists only to preserve short explicit phrases if the rewrite fails or omits them
---
## Rewrite Mechanism
Stage 1 uses a single deterministic LLM call with:
- temperature = 0.0
- no retries
- no streaming
- no structured output enforcement
The system prompt instructs the model to:
- output a comma-separated list
- use short, literal, tag-shaped phrases
- preserve coherent multi-word visual concepts
- avoid inventing details
- avoid demographic inference
- avoid guessing identities
The LLM output is treated as plain text.
---
## Output Format
On success, Stage 1 returns:
- a single string
- containing comma-separated phrases
- with arbitrary spacing normalized
- truncated to a maximum of approximately 800 characters
No further parsing, validation, or canonicalization is applied at this stage.
The rewrite may:
- reorder concepts
- merge or split phrasing
- introduce additional generic visual concepts (e.g. "white background")
---
## Failure and Fallback Behavior
If the LLM call:
- errors
- produces a refusal-like response
- returns empty output
then Stage 1 returns an empty string.
In downstream stages, this empty rewrite may be supplemented by the heuristic
phrases extracted earlier, but Stage 1 itself does not attempt recovery.
---
## Explicit Non-Guarantees
Stage 1 does not guarantee that:
- output phrases correspond to known vocabulary tags
- phrases are unique
- phrases are canonicalized
- phrases are mutually exclusive
- all user concepts are preserved
- added concepts reflect ground truth
Stage 2 must not assume any of the above.
---
## Contract Boundary with Stage 2
Stage 1 guarantees only that:
- output is a comma-separated list of short phrases
- phrases are intended to be retrieval queries, not canonical tags
- output is deterministic for a given input
Stage 2 is responsible for:
- normalization
- deduplication
- head-noun expansion
- vocabulary grounding
- alias handling
- scoring and ranking
---
## Summary (Interview-Safe)
Stage 1 is a deterministic query-rewriting step that reshapes free-form text into
retrieval-friendly phrase queries. It intentionally favors recall and
surface-form alignment over correctness or canonicalization, delegating all
grounding and validation to later stages.
|