Spaces:
Running
Running
Food Desert commited on
Commit ·
d57f0a5
1
Parent(s): 65b7582
Add tooltip override system and curate top-1000 tooltip fixes
Browse files- PROJECT_SUMMARY.md +27 -24
- SESSION_QUICKSTART.md +5 -1
- app.py +29 -1
- data/tag_tooltip_overrides.csv +49 -0
- docs/space_overview.md +11 -1
PROJECT_SUMMARY.md
CHANGED
|
@@ -45,11 +45,11 @@ The system implements a **three-stage pipeline** with strict contracts between s
|
|
| 45 |
- **Module**: `psq_rag.llm.select`
|
| 46 |
- **Output**: Selected tags with optional rationale codes (explicit, strong_implied, weak_implied, style_or_meta, other)
|
| 47 |
|
| 48 |
-
### Stage 3s: Structural Tag Inference (Optional)
|
| 49 |
-
- **Purpose**: Infer structural
|
| 50 |
-
- **Example**:
|
| 51 |
-
- **Implementation**: Group-based
|
| 52 |
-
- **Module**: `psq_rag.llm.select.llm_infer_structural_tags`
|
| 53 |
|
| 54 |
---
|
| 55 |
|
|
@@ -85,7 +85,7 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
|
|
| 85 |
|
| 86 |
### Tag Implication System (Feb 10-11, 2026)
|
| 87 |
- Integrated `tag_implications-2023-07-20.csv`
|
| 88 |
-
- Automatic tag expansion (e.g., fox → canine → canid → mammal)
|
| 89 |
- Expanded ground truth annotations for evaluation
|
| 90 |
- Leaf-only metrics to avoid penalizing implied tags
|
| 91 |
|
|
@@ -114,11 +114,12 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
|
|
| 114 |
### Artifacts (loaded lazily)
|
| 115 |
- **FastText embeddings**: Compressed format for semantic similarity
|
| 116 |
- **TF-IDF vectors + SVD**: For context-based tag similarity
|
| 117 |
-
- **Alias mappings**: Non-canonical → canonical tag projection
|
| 118 |
- **Tag counts**: Frequency information from corpus
|
| 119 |
-
- **Tag implications**: Hierarchical tag relationships (e.g., species → family)
|
| 120 |
-
- **Tag groups** (`data/tag_groups.json`): Structured tag families for inference
|
| 121 |
-
- **Tag wiki definitions** (`data/tag_wiki_defs.json`): E621 wiki data for tags
|
|
|
|
| 122 |
|
| 123 |
### Configuration
|
| 124 |
- **tagging_checklist.txt**: E621 tagging guidelines and categories
|
|
@@ -181,9 +182,10 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
|
|
| 181 |
|
| 182 |
### Sample Scripts
|
| 183 |
- **scripts/rewrite_playground.py**: Stage 1 testing
|
| 184 |
-
- **scripts/stage3_debug.py**: Stage 3 debugging
|
| 185 |
-
- **scripts/test_categorized_suggestions.py**: Category suggestion testing
|
| 186 |
-
- **scripts/test_parser_only.py**: Parser validation
|
|
|
|
| 187 |
|
| 188 |
---
|
| 189 |
|
|
@@ -196,7 +198,7 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
|
|
| 196 |
- Verbose retrieval reporting (optional)
|
| 197 |
- NSFW tag filtering (configurable)
|
| 198 |
- Final prompt composition with deduplication
|
| 199 |
-
- Mascot branding (🐿️ squirrel)
|
| 200 |
|
| 201 |
### Configuration
|
| 202 |
- `allow_nsfw_tags`: NSFW content filtering
|
|
@@ -211,7 +213,7 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
|
|
| 211 |
|
| 212 |
1. **Alias-to-Canonical Projection**: Handles non-canonical tag variants and projects them to e621 canonical forms
|
| 213 |
|
| 214 |
-
2. **Head-Noun Expansion**: Automatically extracts head nouns from multi-word phrases (e.g., "big shirt" → also search "shirt")
|
| 215 |
|
| 216 |
3. **Dual Scoring**: FastText semantic similarity + TF-IDF/SVD context similarity with weighted fusion
|
| 217 |
|
|
@@ -264,14 +266,15 @@ Core packages (requirements.txt):
|
|
| 264 |
|
| 265 |
In this session (and recent sessions based on git history), we have:
|
| 266 |
|
| 267 |
-
1. ✅ **Built tag categorization infrastructure** based on e621 checklist
|
| 268 |
-
2. ✅ **Created category parser** with tier and constraint support
|
| 269 |
-
3. ✅ **Implemented TF-IDF-based categorized suggestions**
|
| 270 |
-
4. ✅ **Added comprehensive evaluation metrics** (per-category P/R/F1, ranking metrics)
|
| 271 |
-
5. ✅ **Fixed multi-select constraint handling** for body_type, species, gender
|
| 272 |
-
6. ✅ **Improved structural inference system** with group-based wiki data approach
|
| 273 |
-
7. ✅ **Enhanced evaluation pipeline** with parallel processing and implication expansion
|
| 274 |
-
8. ✅ **Added diagnostic and analysis tools** for debugging and quality assessment
|
| 275 |
-
9. ✅ **Cleaned up binary files** and moved to proper XET storage on Hugging Face
|
| 276 |
|
| 277 |
The project is now at a sophisticated stage with a full three-stage pipeline, comprehensive evaluation infrastructure, and category-based tag organization aligned with e621's tagging best practices.
|
|
|
|
|
|
| 45 |
- **Module**: `psq_rag.llm.select`
|
| 46 |
- **Output**: Selected tags with optional rationale codes (explicit, strong_implied, weak_implied, style_or_meta, other)
|
| 47 |
|
| 48 |
+
### Stage 3s: Structural Tag Inference (Optional)
|
| 49 |
+
- **Purpose**: Infer structural tags directly from prompt text using a fixed group definition set
|
| 50 |
+
- **Example**: character count, body type, gender, clothing state, visual elements
|
| 51 |
+
- **Implementation**: Group-based LLM inference with deterministic postprocessing (for example, trio implies group)
|
| 52 |
+
- **Module**: `psq_rag.llm.select.llm_infer_structural_tags`
|
| 53 |
|
| 54 |
---
|
| 55 |
|
|
|
|
| 85 |
|
| 86 |
### Tag Implication System (Feb 10-11, 2026)
|
| 87 |
- Integrated `tag_implications-2023-07-20.csv`
|
| 88 |
+
- Automatic tag expansion (e.g., fox → canine → canid → mammal)
|
| 89 |
- Expanded ground truth annotations for evaluation
|
| 90 |
- Leaf-only metrics to avoid penalizing implied tags
|
| 91 |
|
|
|
|
| 114 |
### Artifacts (loaded lazily)
|
| 115 |
- **FastText embeddings**: Compressed format for semantic similarity
|
| 116 |
- **TF-IDF vectors + SVD**: For context-based tag similarity
|
| 117 |
+
- **Alias mappings**: Non-canonical → canonical tag projection
|
| 118 |
- **Tag counts**: Frequency information from corpus
|
| 119 |
+
- **Tag implications**: Hierarchical tag relationships (e.g., species → family)
|
| 120 |
+
- **Tag groups** (`data/tag_groups.json`): Structured tag families for inference
|
| 121 |
+
- **Tag wiki definitions** (`data/tag_wiki_defs.json`): E621 wiki data for tags
|
| 122 |
+
- **Tooltip overrides** (`data/tag_tooltip_overrides.csv`): Curated tooltip text fixes that override wiki definitions for specific tags
|
| 123 |
|
| 124 |
### Configuration
|
| 125 |
- **tagging_checklist.txt**: E621 tagging guidelines and categories
|
|
|
|
| 182 |
|
| 183 |
### Sample Scripts
|
| 184 |
- **scripts/rewrite_playground.py**: Stage 1 testing
|
| 185 |
+
- **scripts/stage3_debug.py**: Stage 3 debugging
|
| 186 |
+
- **scripts/test_categorized_suggestions.py**: Category suggestion testing
|
| 187 |
+
- **scripts/test_parser_only.py**: Parser validation
|
| 188 |
+
- **scripts/test_structural_trio_group_rule.py**: Regression test for `trio -> group` structural postprocess and eval sample invariant
|
| 189 |
|
| 190 |
---
|
| 191 |
|
|
|
|
| 198 |
- Verbose retrieval reporting (optional)
|
| 199 |
- NSFW tag filtering (configurable)
|
| 200 |
- Final prompt composition with deduplication
|
| 201 |
+
- Mascot branding (🐿️ squirrel)
|
| 202 |
|
| 203 |
### Configuration
|
| 204 |
- `allow_nsfw_tags`: NSFW content filtering
|
|
|
|
| 213 |
|
| 214 |
1. **Alias-to-Canonical Projection**: Handles non-canonical tag variants and projects them to e621 canonical forms
|
| 215 |
|
| 216 |
+
2. **Head-Noun Expansion**: Automatically extracts head nouns from multi-word phrases (e.g., "big shirt" → also search "shirt")
|
| 217 |
|
| 218 |
3. **Dual Scoring**: FastText semantic similarity + TF-IDF/SVD context similarity with weighted fusion
|
| 219 |
|
|
|
|
| 266 |
|
| 267 |
In this session (and recent sessions based on git history), we have:
|
| 268 |
|
| 269 |
+
1. ✅ **Built tag categorization infrastructure** based on e621 checklist
|
| 270 |
+
2. ✅ **Created category parser** with tier and constraint support
|
| 271 |
+
3. ✅ **Implemented TF-IDF-based categorized suggestions**
|
| 272 |
+
4. ✅ **Added comprehensive evaluation metrics** (per-category P/R/F1, ranking metrics)
|
| 273 |
+
5. ✅ **Fixed multi-select constraint handling** for body_type, species, gender
|
| 274 |
+
6. ✅ **Improved structural inference system** with group-based wiki data approach
|
| 275 |
+
7. ✅ **Enhanced evaluation pipeline** with parallel processing and implication expansion
|
| 276 |
+
8. ✅ **Added diagnostic and analysis tools** for debugging and quality assessment
|
| 277 |
+
9. ✅ **Cleaned up binary files** and moved to proper XET storage on Hugging Face
|
| 278 |
|
| 279 |
The project is now at a sophisticated stage with a full three-stage pipeline, comprehensive evaluation infrastructure, and category-based tag organization aligned with e621's tagging best practices.
|
| 280 |
+
|
SESSION_QUICKSTART.md
CHANGED
|
@@ -7,7 +7,7 @@ A RAG system that converts natural language prompts → e621-style tags for furr
|
|
| 7 |
1. **Stage 1 (Rewrite)**: Natural language → tag-shaped phrases (LLM)
|
| 8 |
2. **Stage 2 (Retrieval)**: Phrases → candidate tags (FastText + TF-IDF/SVD, closed vocab)
|
| 9 |
3. **Stage 3 (Selection)**: Candidates → final selected tags (LLM)
|
| 10 |
-
4. **Stage 3s (Structural)**:
|
| 11 |
|
| 12 |
## Latest Features (Feb 13-14, 2026)
|
| 13 |
- **Tag Categorization**: Organized suggestions by e621 checklist categories (species, clothing, posture, etc.)
|
|
@@ -23,9 +23,11 @@ A RAG system that converts natural language prompts → e621-style tags for furr
|
|
| 23 |
- `scripts/eval_categorized.py` - Per-category metrics
|
| 24 |
- `scripts/analyze_threshold_grid.py` - Threshold grid analysis (score/global rank/phrase rank)
|
| 25 |
- `scripts/analyze_caption_evident_audit.py` - Caption-evident audit vs retrieval
|
|
|
|
| 26 |
- `docs/retrieval_contract.md` - Stage 2 spec
|
| 27 |
- `docs/stage3_contract.md` - Stage 3 spec
|
| 28 |
- `tagging_checklist.txt` - E621 tagging guidelines
|
|
|
|
| 29 |
|
| 30 |
## Running Code
|
| 31 |
```bash
|
|
@@ -74,12 +76,14 @@ ls -la data/eval_results/
|
|
| 74 |
- **Caption-evident audit**: Run `scripts/analyze_caption_evident_audit.py`
|
| 75 |
- **Test retrieval**: Use `scripts/smoke_test.py`
|
| 76 |
- **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
|
|
|
|
| 77 |
|
| 78 |
## Data Artifacts (Lazy-loaded)
|
| 79 |
- FastText embeddings (semantic similarity)
|
| 80 |
- TF-IDF + SVD matrices (context similarity)
|
| 81 |
- Alias → canonical tag mappings
|
| 82 |
- Tag counts, implications, groups, wiki definitions
|
|
|
|
| 83 |
|
| 84 |
## Eval Datasets
|
| 85 |
- `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl` - Base eval set (implication-expanded GT)
|
|
|
|
| 7 |
1. **Stage 1 (Rewrite)**: Natural language → tag-shaped phrases (LLM)
|
| 8 |
2. **Stage 2 (Retrieval)**: Phrases → candidate tags (FastText + TF-IDF/SVD, closed vocab)
|
| 9 |
3. **Stage 3 (Selection)**: Candidates → final selected tags (LLM)
|
| 10 |
+
4. **Stage 3s (Structural)**: Prompt text → structural tags (LLM over fixed groups) plus deterministic postprocess rules (for example, `trio` implies `group`)
|
| 11 |
|
| 12 |
## Latest Features (Feb 13-14, 2026)
|
| 13 |
- **Tag Categorization**: Organized suggestions by e621 checklist categories (species, clothing, posture, etc.)
|
|
|
|
| 23 |
- `scripts/eval_categorized.py` - Per-category metrics
|
| 24 |
- `scripts/analyze_threshold_grid.py` - Threshold grid analysis (score/global rank/phrase rank)
|
| 25 |
- `scripts/analyze_caption_evident_audit.py` - Caption-evident audit vs retrieval
|
| 26 |
+
- `scripts/test_structural_trio_group_rule.py` - Regression test for structural `trio -> group` mapping and eval sample invariant
|
| 27 |
- `docs/retrieval_contract.md` - Stage 2 spec
|
| 28 |
- `docs/stage3_contract.md` - Stage 3 spec
|
| 29 |
- `tagging_checklist.txt` - E621 tagging guidelines
|
| 30 |
+
- `data/tag_tooltip_overrides.csv` - Manual tooltip text overrides (`tag, tooltip_override`)
|
| 31 |
|
| 32 |
## Running Code
|
| 33 |
```bash
|
|
|
|
| 76 |
- **Caption-evident audit**: Run `scripts/analyze_caption_evident_audit.py`
|
| 77 |
- **Test retrieval**: Use `scripts/smoke_test.py`
|
| 78 |
- **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
|
| 79 |
+
- **Fix tooltip definitions**: Add/adjust rows in `data/tag_tooltip_overrides.csv` instead of editing extracted `data/tag_wiki_defs.json`
|
| 80 |
|
| 81 |
## Data Artifacts (Lazy-loaded)
|
| 82 |
- FastText embeddings (semantic similarity)
|
| 83 |
- TF-IDF + SVD matrices (context similarity)
|
| 84 |
- Alias → canonical tag mappings
|
| 85 |
- Tag counts, implications, groups, wiki definitions
|
| 86 |
+
- Tooltip override CSV for curated definition fixes
|
| 87 |
|
| 88 |
## Eval Datasets
|
| 89 |
- `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl` - Base eval set (implication-expanded GT)
|
app.py
CHANGED
|
@@ -118,6 +118,32 @@ def _load_tag_wiki_defs() -> Dict[str, str]:
|
|
| 118 |
return {}
|
| 119 |
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
@lru_cache(maxsize=1)
|
| 122 |
def _load_about_docs_markdown() -> str:
|
| 123 |
candidates = [
|
|
@@ -156,7 +182,9 @@ def _tooltip_text_for_tag(tag: str) -> str:
|
|
| 156 |
count = None
|
| 157 |
if isinstance(count, int):
|
| 158 |
parts.append(f"Count: {count:,}")
|
| 159 |
-
d =
|
|
|
|
|
|
|
| 160 |
if d:
|
| 161 |
parts.append(d)
|
| 162 |
return "\n".join(parts).strip()
|
|
|
|
| 118 |
return {}
|
| 119 |
|
| 120 |
|
| 121 |
+
@lru_cache(maxsize=1)
|
| 122 |
+
def _load_tag_tooltip_overrides() -> Dict[str, str]:
|
| 123 |
+
"""Load optional per-tag tooltip text overrides.
|
| 124 |
+
|
| 125 |
+
File format:
|
| 126 |
+
data/tag_tooltip_overrides.csv
|
| 127 |
+
columns: tag, tooltip_override
|
| 128 |
+
"""
|
| 129 |
+
p = Path("data/tag_tooltip_overrides.csv")
|
| 130 |
+
if not p.exists():
|
| 131 |
+
return {}
|
| 132 |
+
|
| 133 |
+
try:
|
| 134 |
+
out: Dict[str, str] = {}
|
| 135 |
+
with p.open("r", encoding="utf-8", newline="") as f:
|
| 136 |
+
reader = csv.DictReader(f)
|
| 137 |
+
for row in reader:
|
| 138 |
+
tag = _norm_tag_for_lookup(str(row.get("tag", "")))
|
| 139 |
+
text = " ".join(str(row.get("tooltip_override", "")).split())
|
| 140 |
+
if tag and text:
|
| 141 |
+
out[tag] = text
|
| 142 |
+
return out
|
| 143 |
+
except Exception:
|
| 144 |
+
return {}
|
| 145 |
+
|
| 146 |
+
|
| 147 |
@lru_cache(maxsize=1)
|
| 148 |
def _load_about_docs_markdown() -> str:
|
| 149 |
candidates = [
|
|
|
|
| 182 |
count = None
|
| 183 |
if isinstance(count, int):
|
| 184 |
parts.append(f"Count: {count:,}")
|
| 185 |
+
d = _load_tag_tooltip_overrides().get(t, "")
|
| 186 |
+
if not d:
|
| 187 |
+
d = _load_tag_wiki_defs().get(t, "")
|
| 188 |
if d:
|
| 189 |
parts.append(d)
|
| 190 |
return "\n".join(parts).strip()
|
data/tag_tooltip_overrides.csv
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
tag,tooltip_override
|
| 2 |
+
trio,"Exactly three characters are visible in the image. In e621 practice, trio is often co-tagged with group."
|
| 3 |
+
nintendo,"Nintendo is a Japanese video game company and publisher."
|
| 4 |
+
pokemon,"Pokemon is a media franchise centered on collecting, training, and battling Pokemon species."
|
| 5 |
+
vaginal,"Used for posts depicting vaginal penetration or other explicitly vaginal sexual focus."
|
| 6 |
+
piercing,"Posts depicting visible body piercings (ear, facial, nipple, genital, etc.)."
|
| 7 |
+
animal_penis,"Tag for penises with clearly animal (non-humanoid) anatomy."
|
| 8 |
+
plant,"Tag for plants or flora (realistic or stylized) depicted in the image."
|
| 9 |
+
furniture,"Movable household or interior objects such as chairs, tables, beds, sofas, and cabinets."
|
| 10 |
+
ear_piercing,"A visible piercing on the ear (including lobe or cartilage)."
|
| 11 |
+
solo_focus,"In a multi-character image, one character is clearly emphasized as the main subject."
|
| 12 |
+
bird,"A member of class Aves (avian species), including realistic or anthro birds."
|
| 13 |
+
by_conditional_dnp,"Used for posts by artists with conditional do-not-post (DNP) restrictions."
|
| 14 |
+
bedroom_eyes,"Half-lidded or narrowed eyes used in a seductive or flirtatious context."
|
| 15 |
+
pink_nipples,"Nipples that are visibly pink."
|
| 16 |
+
by_avoid_posting,"Artist/admin tag for creators on the avoid-posting list (do-not-post status)."
|
| 17 |
+
grey_background,"An image with a predominantly grey background."
|
| 18 |
+
pokemorph,"A character depicted as a human-like version of a Pokemon species."
|
| 19 |
+
after_sex,"Depicts a moment immediately after sexual activity."
|
| 20 |
+
by_unknown_artist,"Artist tag used when the original artist is unknown."
|
| 21 |
+
generation_6_pokemon,"Pokemon species introduced in Generation 6 (X/Y)."
|
| 22 |
+
crossgender,"A character depicted as a different gender than their usual depiction."
|
| 23 |
+
<3_eyes,"Eyes that are heart-shaped or contain heart symbols."
|
| 24 |
+
animal_crossing,"The Animal Crossing video game franchise."
|
| 25 |
+
cum_while_penetrated,"Ejaculation while the character is being penetrated."
|
| 26 |
+
animal_pussy,"Genitals with clearly animal (non-humanoid) vulva/pussy anatomy."
|
| 27 |
+
generation_7_pokemon,"Pokemon species introduced in Generation 7 (Sun/Moon)."
|
| 28 |
+
pokephilia,"Sexual or romantic activity involving a Pokemon and a non-Pokemon character."
|
| 29 |
+
breast_squish,"Breasts visibly pressed or squished against a surface, object, or character."
|
| 30 |
+
bow_ribbon,"A bow made from ribbon."
|
| 31 |
+
featureless_crotch,"Crotch area shown without visible genital detail where it would normally be expected."
|
| 32 |
+
blue_background,"An image with a predominantly blue background."
|
| 33 |
+
fluttershy_(mlp),"Fluttershy, a main character from My Little Pony: Friendship is Magic."
|
| 34 |
+
by_third-party_edit,"Artist/admin tag used when a post was edited by someone other than the original artist."
|
| 35 |
+
rainbow_dash_(mlp),"Rainbow Dash, a main character from My Little Pony: Friendship is Magic."
|
| 36 |
+
saliva_string,"A visible strand of saliva stretching between a mouth/tongue and another surface."
|
| 37 |
+
male/ambiguous,"Sexual or romantic activity between a male character and an ambiguous-gender character."
|
| 38 |
+
hair_bow,"A bow accessory worn on or in the hair."
|
| 39 |
+
bow_tie,"A neck accessory tied in a bow shape."
|
| 40 |
+
rarity_(mlp),"Rarity, a main character from My Little Pony: Friendship is Magic."
|
| 41 |
+
bow_accessory,"An accessory that features a bow."
|
| 42 |
+
patreon,"Patreon, a membership and crowdfunding platform."
|
| 43 |
+
pinkie_pie_(mlp),"Pinkie Pie, a main character from My Little Pony: Friendship is Magic."
|
| 44 |
+
warcraft,"The Warcraft video game franchise by Blizzard Entertainment."
|
| 45 |
+
generation_8_pokemon,"Pokemon species introduced in Generation 8 (Sword/Shield)."
|
| 46 |
+
sexual_barrier_device,"A barrier contraceptive used during sexual activity (for example condoms or dental dams)."
|
| 47 |
+
applejack_(mlp),"Applejack, a main character from My Little Pony: Friendship is Magic."
|
| 48 |
+
princess_celestia_(mlp),"Princess Celestia, an alicorn ruler character from My Little Pony."
|
| 49 |
+
princess_luna_(mlp),"Princess Luna, an alicorn ruler character from My Little Pony."
|
docs/space_overview.md
CHANGED
|
@@ -18,7 +18,7 @@ Design goals:
|
|
| 18 |
- `Rewrite`:
|
| 19 |
Turns the user prompt into short, tag-like pseudo-phrases that are easier to match in vector retrieval. These phrases are optimized as search queries for candidate lookup.
|
| 20 |
- `Structural Inference`:
|
| 21 |
-
Runs an LLM call over a fixed set of high-level structure tags (for example character count, body type, gender, clothing state, gaze/text). It outputs only the structural tags it believes are supported.
|
| 22 |
- `Probe Inference`:
|
| 23 |
Runs a separate LLM call over a small, curated set of informative tags. This is a targeted check for tags that are often useful for reranking and final selection.
|
| 24 |
- `Retrieval Candidates`:
|
|
@@ -67,6 +67,16 @@ The current Space does not claim to solve these other domains directly. Porting
|
|
| 67 |
- Tag implications graph
|
| 68 |
- Group/category mappings for row display
|
| 69 |
- Optional wiki definitions (used for hover help)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
## Technologies Used
|
| 72 |
|
|
|
|
| 18 |
- `Rewrite`:
|
| 19 |
Turns the user prompt into short, tag-like pseudo-phrases that are easier to match in vector retrieval. These phrases are optimized as search queries for candidate lookup.
|
| 20 |
- `Structural Inference`:
|
| 21 |
+
Runs an LLM call over a fixed set of high-level structure tags (for example character count, body type, gender, clothing state, gaze/text). It outputs only the structural tags it believes are supported, then applies deterministic postprocessing rules (for example `trio` implies `group`).
|
| 22 |
- `Probe Inference`:
|
| 23 |
Runs a separate LLM call over a small, curated set of informative tags. This is a targeted check for tags that are often useful for reranking and final selection.
|
| 24 |
- `Retrieval Candidates`:
|
|
|
|
| 67 |
- Tag implications graph
|
| 68 |
- Group/category mappings for row display
|
| 69 |
- Optional wiki definitions (used for hover help)
|
| 70 |
+
- Optional tooltip override table (`data/tag_tooltip_overrides.csv`) for manual corrections when wiki extraction is noisy
|
| 71 |
+
|
| 72 |
+
## Tooltip Text Resolution
|
| 73 |
+
|
| 74 |
+
Tooltip text for tags resolves in this order:
|
| 75 |
+
|
| 76 |
+
1. `data/tag_tooltip_overrides.csv` (`tag, tooltip_override`) when a tag has a curated override
|
| 77 |
+
2. `data/tag_wiki_defs.json` as fallback
|
| 78 |
+
|
| 79 |
+
This keeps the extracted wiki file immutable while allowing targeted manual fixes for frequent/high-impact tags.
|
| 80 |
|
| 81 |
## Technologies Used
|
| 82 |
|