File size: 3,590 Bytes
c6be992
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# Stage 1 — Query Rewriting Contract

## Purpose

Stage 1 (“Query Rewriting”) converts a free-form natural-language prompt into a
comma-separated list of short, tag-shaped phrases suitable for downstream
retrieval over a closed image-tag vocabulary.

This stage is not tagging, not normalization, and not validation.
Its sole role is to rewrite user intent into a retrieval-friendly surface form
with high recall.

---

## Inputs

- User prompt: an arbitrary string entered by the user.
- The input may include:
  - natural language
  - comma-separated phrases
  - Stable-Diffusion-style parentheses and weights
  - punctuation and spacing artifacts

No structural guarantees are assumed about the input.

---

## Pre-Rewrite Heuristics (Non-LLM)

Before the LLM rewrite is invoked, the system performs a lightweight heuristic
extraction:

- The prompt is split on "." and ","
- Segments with three or fewer whitespace-separated tokens are retained
- Case-insensitive deduplication is applied

This produces a small list of user-provided phrases that may later be appended
to the rewrite output for retrieval support.

This heuristic:
- is lossy
- is not authoritative
- exists only to preserve short explicit phrases if the rewrite fails or omits them

---

## Rewrite Mechanism

Stage 1 uses a single deterministic LLM call with:

- temperature = 0.0
- no retries
- no streaming
- no structured output enforcement

The system prompt instructs the model to:

- output a comma-separated list
- use short, literal, tag-shaped phrases
- preserve coherent multi-word visual concepts
- avoid inventing details
- avoid demographic inference
- avoid guessing identities

The LLM output is treated as plain text.

---

## Output Format

On success, Stage 1 returns:

- a single string
- containing comma-separated phrases
- with arbitrary spacing normalized
- truncated to a maximum of approximately 800 characters

No further parsing, validation, or canonicalization is applied at this stage.

The rewrite may:
- reorder concepts
- merge or split phrasing
- introduce additional generic visual concepts (e.g. "white background")

---

## Failure and Fallback Behavior

If the LLM call:

- errors
- produces a refusal-like response
- returns empty output

then Stage 1 returns an empty string.

In downstream stages, this empty rewrite may be supplemented by the heuristic
phrases extracted earlier, but Stage 1 itself does not attempt recovery.

---

## Explicit Non-Guarantees

Stage 1 does not guarantee that:

- output phrases correspond to known vocabulary tags
- phrases are unique
- phrases are canonicalized
- phrases are mutually exclusive
- all user concepts are preserved
- added concepts reflect ground truth

Stage 2 must not assume any of the above.

---

## Contract Boundary with Stage 2

Stage 1 guarantees only that:

- output is a comma-separated list of short phrases
- phrases are intended to be retrieval queries, not canonical tags
- output is deterministic for a given input

Stage 2 is responsible for:

- normalization
- deduplication
- head-noun expansion
- vocabulary grounding
- alias handling
- scoring and ranking

---

## Summary (Interview-Safe)

Stage 1 is a deterministic query-rewriting step that reshapes free-form text into
retrieval-friendly phrase queries. It intentionally favors recall and
surface-form alignment over correctness or canonicalization, delegating all
grounding and validation to later stages.