Spaces:

FoodDesert
/

Prompt_Squirrel_RAG

Running

App Files Files Community

Prompt_Squirrel_RAG / docs /rewrite_contract.md

Food Desert

Add alias-based character tag filtering for Stage 3

c6be992 4 months ago

preview code

Raw

History Blame

3.59 kB

Stage 1 — Query Rewriting Contract

Purpose

Stage 1 (“Query Rewriting”) converts a free-form natural-language prompt into a comma-separated list of short, tag-shaped phrases suitable for downstream retrieval over a closed image-tag vocabulary.

This stage is not tagging, not normalization, and not validation. Its sole role is to rewrite user intent into a retrieval-friendly surface form with high recall.

Inputs

User prompt: an arbitrary string entered by the user.
The input may include:
- natural language
- comma-separated phrases
- Stable-Diffusion-style parentheses and weights
- punctuation and spacing artifacts

No structural guarantees are assumed about the input.

Pre-Rewrite Heuristics (Non-LLM)

Before the LLM rewrite is invoked, the system performs a lightweight heuristic extraction:

The prompt is split on "." and ","
Segments with three or fewer whitespace-separated tokens are retained
Case-insensitive deduplication is applied

This produces a small list of user-provided phrases that may later be appended to the rewrite output for retrieval support.

This heuristic:

is lossy
is not authoritative
exists only to preserve short explicit phrases if the rewrite fails or omits them

Rewrite Mechanism

Stage 1 uses a single deterministic LLM call with:

temperature = 0.0
no retries
no streaming
no structured output enforcement

The system prompt instructs the model to:

output a comma-separated list
use short, literal, tag-shaped phrases
preserve coherent multi-word visual concepts
avoid inventing details
avoid demographic inference
avoid guessing identities

The LLM output is treated as plain text.

Output Format

On success, Stage 1 returns:

a single string
containing comma-separated phrases
with arbitrary spacing normalized
truncated to a maximum of approximately 800 characters

No further parsing, validation, or canonicalization is applied at this stage.

The rewrite may:

reorder concepts
merge or split phrasing
introduce additional generic visual concepts (e.g. "white background")

Failure and Fallback Behavior

If the LLM call:

errors
produces a refusal-like response
returns empty output

then Stage 1 returns an empty string.

In downstream stages, this empty rewrite may be supplemented by the heuristic phrases extracted earlier, but Stage 1 itself does not attempt recovery.

Explicit Non-Guarantees

Stage 1 does not guarantee that:

output phrases correspond to known vocabulary tags
phrases are unique
phrases are canonicalized
phrases are mutually exclusive
all user concepts are preserved
added concepts reflect ground truth

Stage 2 must not assume any of the above.

Contract Boundary with Stage 2

Stage 1 guarantees only that:

output is a comma-separated list of short phrases
phrases are intended to be retrieval queries, not canonical tags
output is deterministic for a given input

Stage 2 is responsible for:

normalization
deduplication
head-noun expansion
vocabulary grounding
alias handling
scoring and ranking

Summary (Interview-Safe)

Stage 1 is a deterministic query-rewriting step that reshapes free-form text into retrieval-friendly phrase queries. It intentionally favors recall and surface-form alignment over correctness or canonicalization, delegating all grounding and validation to later stages.