Food Desert commited on
Commit
73f56cf
·
1 Parent(s): 41dd600

Add eval audit tools, caption-evident set, and logging

Browse files
PROJECT_SUMMARY.md CHANGED
@@ -89,18 +89,23 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
89
  - Expanded ground truth annotations for evaluation
90
  - Leaf-only metrics to avoid penalizing implied tags
91
 
92
- ### Evaluation Enhancements (Feb 10-14, 2026)
93
- - Added `--min-why` threshold filtering (explicit, strong_implied, weak_implied)
94
- - Per-tag evidence tracking
95
- - Compact eval output format
96
- - Retrieval gap analysis scripts
97
- - Multiple eval runs with different configurations
98
- - Stored eval results in `data/eval_results/`
99
-
100
- ### Code Quality Improvements
101
- - Removed binary PNG files (migrated to Hugging Face XET storage)
102
- - Fixed eval_categorized.py compatibility with eval_pipeline.py output
103
- - Enhanced diagnostic and analysis scripts
 
 
 
 
 
104
 
105
  ---
106
 
@@ -122,10 +127,11 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
122
  - **SamplePrompts.csv**: Test prompts for development
123
  - **TagDocumentation.txt**: E621 tag documentation
124
 
125
- ### Evaluation
126
- - **data/eval_samples/**: Test images with ground truth annotations
127
- - **data/eval_results/**: Stored evaluation results (JSONL format)
128
- - **eval_analysis.txt**: Latest per-category performance metrics
 
129
 
130
  ---
131
 
@@ -152,20 +158,24 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
152
 
153
  ## Testing & Evaluation
154
 
155
- ### Scripts
156
- - **scripts/eval_pipeline.py**: Main evaluation harness
157
- - Parallel processing support
158
- - Multiple min_why thresholds
159
- - Ground truth comparison with implications expansion
 
 
160
 
161
  - **scripts/eval_categorized.py**: Per-category evaluation
162
  - Precision, recall, F1 per category
163
  - Constraint validation (exactly_one, multi, etc.)
164
  - Tier-based aggregation (CRITICAL, IMPORTANT, etc.)
165
 
166
- - **scripts/analyze_compact_eval.py**: Compact evaluation analysis
167
- - **scripts/analyze_retrieval_gaps.py**: Retrieval gap identification
168
- - **scripts/diagnose_structural_clothing.py**: Clothing inference diagnostics
 
 
169
  - **scripts/extract_wiki_data.py**: E621 wiki data extraction
170
  - **scripts/smoke_test.py**: Quick pipeline validation
171
 
 
89
  - Expanded ground truth annotations for evaluation
90
  - Leaf-only metrics to avoid penalizing implied tags
91
 
92
+ ### Evaluation Enhancements (Feb 10-14, 2026)
93
+ - Added `--min-why` threshold filtering (explicit, strong_implied, weak_implied)
94
+ - Per-tag evidence tracking
95
+ - Compact eval output format
96
+ - Retrieval gap analysis scripts
97
+ - Multiple eval runs with different configurations
98
+ - Stored eval results in `data/eval_results/`
99
+ - Added per-phrase retrieval cap flag: `--per-phrase-final-k`
100
+ - Added Stage 3 selection score/rank logging for post-hoc threshold analysis
101
+ - Added score/global-rank/phrase-rank grid analysis script
102
+
103
+ ### Code Quality Improvements
104
+ - Removed binary PNG files (migrated to Hugging Face XET storage)
105
+ - Fixed eval_categorized.py compatibility with eval_pipeline.py output
106
+ - Enhanced diagnostic and analysis scripts
107
+ - Ensured tagging checklist loads from repo root if present
108
+ - Forced UTF-8 stdout/stderr in eval pipeline to avoid Windows encoding crashes
109
 
110
  ---
111
 
 
127
  - **SamplePrompts.csv**: Test prompts for development
128
  - **TagDocumentation.txt**: E621 tag documentation
129
 
130
+ ### Evaluation
131
+ - **data/eval_samples/**: Test images with ground truth annotations
132
+ - **data/eval_results/**: Stored evaluation results (JSONL format)
133
+ - **eval_analysis.txt**: Latest per-category performance metrics
134
+ - **data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl**: Caption-evident GT subset (10 samples) for retrieval-ceiling audits
135
 
136
  ---
137
 
 
158
 
159
  ## Testing & Evaluation
160
 
161
+ ### Scripts
162
+ - **scripts/eval_pipeline.py**: Main evaluation harness
163
+ - Parallel processing support
164
+ - Multiple min_why thresholds
165
+ - Ground truth comparison with implications expansion
166
+ - `--per-phrase-final-k` retrieval cap control
167
+ - Logs `stage3_selected_scores`, `stage3_selected_ranks`, `stage3_selected_phrase_ranks`
168
 
169
  - **scripts/eval_categorized.py**: Per-category evaluation
170
  - Precision, recall, F1 per category
171
  - Constraint validation (exactly_one, multi, etc.)
172
  - Tier-based aggregation (CRITICAL, IMPORTANT, etc.)
173
 
174
+ - **scripts/analyze_compact_eval.py**: Compact evaluation analysis
175
+ - **scripts/analyze_retrieval_gaps.py**: Retrieval gap identification
176
+ - **scripts/analyze_threshold_grid.py**: Post-hoc threshold grids (score/global rank/phrase rank)
177
+ - **scripts/analyze_caption_evident_audit.py**: Caption-evident audit vs retrieval (optional implication expansion)
178
+ - **scripts/diagnose_structural_clothing.py**: Clothing inference diagnostics
179
  - **scripts/extract_wiki_data.py**: E621 wiki data extraction
180
  - **scripts/smoke_test.py**: Quick pipeline validation
181
 
SESSION_QUICKSTART.md CHANGED
@@ -15,15 +15,17 @@ A RAG system that converts natural language prompts → e621-style tags for furr
15
  - **Evaluation Metrics**: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
16
  - **Multi-select Constraints**: Fixed body_type, species, gender to allow multiple tags
17
 
18
- ## Key Files
19
- - `app.py` - Gradio web interface
20
- - `psq_rag/tagging/categorized_suggestions.py` - Category-based tag suggestions
21
- - `psq_rag/tagging/category_parser.py` - Parse e621 checklist
22
- - `scripts/eval_pipeline.py` - Main evaluation harness
23
- - `scripts/eval_categorized.py` - Per-category metrics
24
- - `docs/retrieval_contract.md` - Stage 2 spec
25
- - `docs/stage3_contract.md` - Stage 3 spec
26
- - `tagging_checklist.txt` - E621 tagging guidelines
 
 
27
 
28
  ## Running Code
29
  ```bash
@@ -65,17 +67,30 @@ ls -la psq_rag/
65
  ls -la data/eval_results/
66
  ```
67
 
68
- ## Common Tasks
69
- - **Add category**: Edit `tagging_checklist.txt`, update parser
70
- - **Eval changes**: Run `scripts/eval_pipeline.py`, then `scripts/eval_categorized.py`
71
- - **Test retrieval**: Use `scripts/smoke_test.py`
72
- - **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
 
 
73
 
74
- ## Data Artifacts (Lazy-loaded)
75
  - FastText embeddings (semantic similarity)
76
  - TF-IDF + SVD matrices (context similarity)
77
  - Alias → canonical tag mappings
78
- - Tag counts, implications, groups, wiki definitions
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ## NSFW Handling
81
  - Filtered via `word_rating_probabilities.csv` (threshold 0.95)
 
15
  - **Evaluation Metrics**: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
16
  - **Multi-select Constraints**: Fixed body_type, species, gender to allow multiple tags
17
 
18
+ ## Key Files
19
+ - `app.py` - Gradio web interface
20
+ - `psq_rag/tagging/categorized_suggestions.py` - Category-based tag suggestions
21
+ - `psq_rag/tagging/category_parser.py` - Parse e621 checklist
22
+ - `scripts/eval_pipeline.py` - Main evaluation harness
23
+ - `scripts/eval_categorized.py` - Per-category metrics
24
+ - `scripts/analyze_threshold_grid.py` - Threshold grid analysis (score/global rank/phrase rank)
25
+ - `scripts/analyze_caption_evident_audit.py` - Caption-evident audit vs retrieval
26
+ - `docs/retrieval_contract.md` - Stage 2 spec
27
+ - `docs/stage3_contract.md` - Stage 3 spec
28
+ - `tagging_checklist.txt` - E621 tagging guidelines
29
 
30
  ## Running Code
31
  ```bash
 
67
  ls -la data/eval_results/
68
  ```
69
 
70
+ ## Common Tasks
71
+ - **Add category**: Edit `tagging_checklist.txt`, update parser
72
+ - **Eval changes**: Run `scripts/eval_pipeline.py`, then `scripts/eval_categorized.py`
73
+ - **Threshold sweeps**: Run `scripts/analyze_threshold_grid.py` (see `--mode score|rank|phrase_rank`)
74
+ - **Caption-evident audit**: Run `scripts/analyze_caption_evident_audit.py`
75
+ - **Test retrieval**: Use `scripts/smoke_test.py`
76
+ - **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
77
 
78
+ ## Data Artifacts (Lazy-loaded)
79
  - FastText embeddings (semantic similarity)
80
  - TF-IDF + SVD matrices (context similarity)
81
  - Alias → canonical tag mappings
82
+ - Tag counts, implications, groups, wiki definitions
83
+
84
+ ## Eval Datasets
85
+ - `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl` - Base eval set (implication-expanded GT)
86
+ - `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl` - Caption-evident GT subset (10 samples); used to estimate retrieval ceiling from text
87
+
88
+ ## New Eval Features (Feb 2026)
89
+ - `eval_pipeline.py` now logs Stage 3 selection scores and ranks:
90
+ - `stage3_selected_scores` (retrieval score)
91
+ - `stage3_selected_ranks` (global rank)
92
+ - `stage3_selected_phrase_ranks` (per-phrase rank)
93
+ - New CLI flag: `--per-phrase-final-k` to control per-phrase retrieval cap
94
 
95
  ## NSFW Handling
96
  - Filtered via `word_rating_probabilities.csv` (threshold 0.95)
data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {"_meta": true, "note": "Caption-evident audit subset (10 samples). tags_ground_truth_expanded contains only tags judged evident from caption_cogvlm. Use for estimating retrieval ceiling from text. Generated Feb 2026 from data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl.", "source_file": "data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl", "caption_field": "caption_cogvlm", "n_samples": 10}
2
+ {"id": 3285630, "md5": "7e499711a05d48093608fb3a9b140fdc", "caption_cogvlm": "The image showcases an anthropomorphic feline character dressed in a formal attire. The character has a unique hairstyle with a large bun on top and black fur. The feline is wearing a teal shirt, a white tie, and a beige vest. He is holding a white mug in his right hand. The background is simple and transparent, allowing the character to be the main focus.", "caption_llm_0": "a solo male anthropomorphic feline character, clothed in fur and wearing eyewear. He has brown hair and black eyelashes, with 5 fingers on each hand. His body is brown with black fur, and he stands smiling while looking at the viewer. He holds a cup in one hand and a gun in the other. The text label is present but its content is unknown.", "caption_llm_1": "A solo male feline character, depicted in a simple background. the furry artwork features the character clothed in brown fur and black body, with brown hair and black eyelashes. the character is standing and looking at the viewer while holding a gun with one hand and a cup with the other. there's also text present in the image, but its content is not specified.", "caption_llm_2": "a solo male anthro felid, standing and looking at the viewer. He has brown fur and brown eyes, with black eyebrows and eyelashes. His hair is black, styled in a messy manner. He is holding a cup in his 5 fingers while smiling at the viewer. The background is simple and transparent, with no additional elements present.", "caption_llm_3": "A solo male anthropomorphic feline character, clothed in fur with brown and black body and fur colors. he has brown and black hair, as well as eyelashes. the character is standing while looking at the viewer, holding a gun with his 5 fingers. he also holds a cup containing a beverage. the text on the image reads \"text.\"", "caption_llm_4": "A solo male anthropomorphic feline with brown fur and black hair, set against a simple or transparent background.", "caption_llm_5": "A solo male feline, with brown fur and five fingers. the background is simple and transparent. the animal is clothed in fur clothing, has black hair, and possesses brown body coloration.", "caption_llm_6": "A solo male feline, with brown fur and a simple background. the animal is clothed in fur clothing, has black hair, and is depicted against a transparent background.", "caption_llm_7": "A solo male anthropomorphic feline, with brown fur and black hair. the feline has five fingers and is depicted in a solo pose.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"eyewear\", \"topwear\"], \"animals_and_anthropomorphic_features\": [\"anthro\"], \"characters_and_gender\": [\"male\"], \"hairstyle\": [\"hair\", \"brown_hair\", \"black_hair\", \"eyelashes\"], \"background_and_setting\": [\"simple_background\", \"transparent_background\", \"white_background\"], \"body_and_body_parts\": [\"fingers\", \"5_fingers\", \"breasts\", \"feet\", \"eyebrows\", \"toes\", \"teeth\", \"ear_piercing\"], \"furniture_and_objects\": [\"weapon\", \"ranged_weapon\", \"gun\", \"container\", \"beverage\", \"cup\", \"armor\"], \"colors\": [\"brown_body\", \"brown_fur\", \"black_body\", \"black_fur\"], \"emotions_and_expressions\": [\"smile\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\", \"holding_object\", \"holding_weapon\", \"holding_gun\", \"holding_cup\", \"holding_container\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"feline\", \"mammal\", \"felid\"]}", "tags_ground_truth_categorized": "{\"body_and_body_parts\": [\"5_fingers\", \"fingers\"], \"animals_and_anthropomorphic_features\": [\"anthro\"], \"hairstyle\": [\"black_hair\", \"hair\"], \"colors\": [\"brown_body\", \"brown_fur\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\", \"transparent_background\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"felid\", \"feline\", \"mammal\"]}", "tags_ground_truth_expanded": ["alpha_channel", "anthro", "clothed", "clothing", "felid", "feline", "fingers", "fur", "hair", "male", "mammal", "simple_background", "solo", "transparent_background"]}
3
+ {"id": 260449, "md5": "5c21e7ccf1bdaa67e396df8a5bb90dc8", "caption_cogvlm": "The image showcases a group of animated characters. On the left, there's a large, jovial ape with a wide grin, raised arms, and a playful expression. In the center, a large, jovial bear is seen laughing and playfully interacting with a young boy, who is dancing with his arms raised. The boy has a cheerful expression and is wearing a loincloth. On the right, there's a smaller, mischievous-looking primate with a tuft of hair on its head, looking directly at the viewer with a cheeky grin. The background is simple, emphasizing the characters.", "caption_llm_0": "a solo male anthropomorphic bear, standing and looking at the viewer with long black hair. The background is simple and white. The bear has claws, a tongue, and is bipedal. It wears clothing that includes fur on top of its body while being topless from the waist up and wearing bottomwear. There's also a text label present in the image.", "caption_llm_1": "artwork of baloo and mowgli. a solo male character, depicted in furry artwork style. The background is simple and white. The character is clothed, with a top layer of fur and bottomwear. He has a smile on his face and his mouth open, as if gesturing or speaking to the viewer. His body parts include feet, fingers, teeth, toes, breasts (for an overweight character), tufts of hair on his head or body (furry), and young age. This human-like primate species includes elements from apes and bears in its appearance.", "caption_llm_2": "a solo male character, depicted as a human-like primate with long black hair. He is clothed in fur and wears bottomwear, while his upper body is topless. The background is simple and white. The character stands with a smile on his face, looking at the viewer while making a gesture with one hand. His feet and toes are visible, along with his teeth and tufts of hair on his head.", "caption_llm_3": "artwork of baloo and mowgli. a solo male anthropomorphic bear, standing and looking at the viewer with a smile. The bear has claws, a tongue, and is bipedal. It is wearing clothing that includes fur and bottomwear while being topless. The background is simple with a white backdrop. The bear has feet, fingers, teeth, toes, tufts of hair on its head and body parts such as breasts (for an overweight appearance). There are gestures present in the image as well as text labels included in the scene.", "caption_llm_4": "A group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "caption_llm_5": "Artwork of baloo and mowgli. a group of male, slightly chubby primates in various haplorhine species, including apes and bears. they are clothed in simple clothing adorned with fur. the primates are engaged in a lively dance while looking at the viewer. their hair is visible, and the background is kept simple to emphasize their actions and poses.", "caption_llm_6": "A group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "caption_llm_7": "Artwork of baloo and mowgli. a group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"male\"], \"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"size_difference\", \"claws\", \"tongue\", \"feral\", \"biped\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"topless\", \"bottomwear\", \"nude\"], \"hairstyle\": [\"hair\", \"long_hair\", \"black_hair\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"body_and_body_parts\": [\"feet\", \"fingers\", \"teeth\", \"toes\", \"breasts\", \"overweight\", \"tuft\", \"young\", \"5_fingers\", \"belly\", \"navel\", \"big_breasts\", \"muscular\", \"slightly_chubby\", \"markings\"], \"miscellaneous\": [\"text\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\", \"gesture\", \"transformation\", \"looking_at_another\", \"eyes_closed\", \"lying\", \"front_view\"], \"species_or_animal_type\": [\"mammal\", \"haplorhine\", \"canine\", \"bear\", \"canid\", \"monkey\", \"pokemon_(species)\", \"scalie\", \"human\", \"ape\", \"reptile\", \"primate\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"claws\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\", \"topless\"], \"actions_and_poses\": [\"dancing\", \"looking_at_viewer\"], \"number_of_characters\": [\"group\"], \"hairstyle\": [\"hair\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\"], \"body_and_body_parts\": [\"slightly_chubby\"], \"species_or_animal_type\": [\"ape\", \"bear\", \"haplorhine\", \"human\", \"mammal\", \"primate\"]}", "tags_ground_truth_expanded": ["ape", "bear", "clothed", "clothing", "dancing", "fur", "group", "hair", "haplorhine", "human", "looking_at_viewer", "male", "mammal", "primate", "simple_background"]}
4
+ {"id": 1078019, "md5": "fc858593b7b9fbe82ce728778841e0cf", "caption_cogvlm": "The image showcases two anthropomorphic rabbits. The one on the left has a confident and slightly playful expression, with teal eyes and a blush on its cheeks. It's wearing a coat and holding a small plushie. The rabbit on the right appears to be more surprised or taken aback, with wide open blue eyes. Both rabbits seem to be in a close and intimate setting, suggesting a romantic or close relationship between them.", "caption_llm_0": "a male and female anthropomorphic rabbit, both clothed, standing close to each other in a simple white background. They are smiling and blushing while embracing with half-closed eyes. The male rabbit has buckteeth, and they are holding an object while looking at the viewer.", "caption_llm_1": "artwork of clancy (inkyfrog) and percy vison. a young, clothed male and female mustelid in a simple white background setting. They are embracing each other with half-closed eyes and open smiles, while looking at the viewer. The male weasel is holding an object, possibly a gift or toy for their partner. The background has dialogue text that adds to the scene's context.", "caption_llm_2": "a young, clothed male and female mustelid in a simple white background setting. They are embracing each other with half-closed eyes and open smiles, while looking at the viewer. The male weasel is holding an object, possibly a gift or toy for their partner. The background has dialogue text that adds to the scene's context.", "caption_llm_3": "artwork of clancy (inkyfrog) and percy vison. a male and female anthropomorphic rabbit, both clothed, standing close to each other in a simple white background. They are smiling and blushing while embracing with half-closed eyes. The male rabbit has buckteeth, and they are holding an object while looking at the viewer.", "caption_llm_4": "a romantic couple of alternate species rabbits, each with their own unique plushie clothing. They stand close to each other, blushing and open-mouthed in affectionate expressions. The simple white background allows the focus to be on the adorable rabbit duo.", "caption_llm_5": "Artwork of clancy (inkyfrog) and percy vison. a romantic couple of alternate species anthropomorphic rabbits, each with teal eyes and blushing. they are clothed in simple outfits against a white background, holding a plushie between them.", "caption_llm_6": "a romantic couple of alternate species rabbits, each with their own unique plushie clothing. They stand close to each other, blushing and open-mouthed in affectionate expressions. The simple white background allows the focus to be on the adorable rabbit duo.", "caption_llm_7": "artwork of clancy (inkyfrog) and percy vison. a romantic couple of rabbits, one with blue eyes and the other with teal eyes. They are both clothed in simple outfits, holding a plushie between them. The background is white and uncomplicated.", "tags_synthetic_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"buckteeth\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"dialogue\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\"], \"characters_and_gender\": [\"male\", \"female\"], \"number_of_characters\": [\"duo\", \"solo\", \"group\"], \"emotions_and_expressions\": [\"blush\", \"open_mouth\", \"smile\", \"open_smile\", \"half-closed_eyes\", \"embrace\", \"narrowed_eyes\"], \"body_and_body_parts\": [\"teeth\", \"bodily_fluids\", \"young\"], \"colors\": [\"blue_eyes\"], \"miscellaneous\": [\"text\"], \"actions_and_poses\": [\"holding_object\", \"looking_at_viewer\", \"hug\"], \"species_or_animal_type\": [\"weasel\", \"true_musteline\", \"mammal\", \"mustelid\", \"rabbit\", \"lagomorph\", \"musteline\", \"leporid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"alternate_species\", \"anthro\"], \"colors\": [\"blue_eyes\", \"teal_eyes\"], \"emotions_and_expressions\": [\"blush\", \"open_mouth\", \"romantic\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"plushie\"], \"number_of_characters\": [\"duo\"], \"characters_and_gender\": [\"male\", \"male/male\", \"romantic_couple\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"species_or_animal_type\": [\"lagomorph\", \"leporid\", \"mammal\", \"rabbit\"]}", "tags_ground_truth_expanded": ["anthro", "blue_eyes", "blush", "clothed", "clothing", "duo", "lagomorph", "leporid", "mammal", "plushie", "rabbit", "romantic", "romantic_couple", "teal_eyes"]}
5
+ {"id": 1624724, "md5": "febfe277847481ae546525d1ccf4baff", "caption_cogvlm": "The image showcases a cartoonish, smiling creature with large, round eyes and a prominent red nose. It has a tan body with spots and possesses a unique, crosshaped mouth. The creature appears to be floating or hovering against a simple white background.", "caption_llm_0": "a solo character with ambiguous gender, standing in front view while looking at the viewer. The background is simple and white or transparent. The character has fur clothing and hair accessories, as well as anthropomorphic features such as scales and toony appearance. They have a brown body color with black eyes, or yellow body with green body color. This artwork may represent an alien experiment from Lilo & Stitch or a Generation 3 Pokémon species.", "caption_llm_1": "A solo, ambiguously gendered character with anthropomorphic features such as scales, toony appearance, and a long tongue. the character is wearing fur clothing and has spots on its body. it has teeth and displays an open-mouth smile while blushing. the species or animal type includes aliens, lilo & stitch experiments, generation 3 pokémon hybrids, and various pokémon species. the colors present in the image are brown body, yellow body, tan body, green body with black eyes.", "caption_llm_2": "a solo character with ambiguous gender, standing in front view while looking at the viewer. The background is simple and white or transparent. The character has fur clothing and hair accessories, as well as anthropomorphic features such as scales and toony appearance. They have a brown body color with black eyes, or yellow body with green body color. This artwork may represent an alien experiment from Lilo & Stitch or a Generation 3 Pokémon species.", "caption_llm_3": "A solo character with ambiguous gender, standing in front view while looking at the viewer. the background is simple and white or transparent. the character has fur clothing and hair accessories, along with a brown body color. it also features yellow or green body colors, black eyes, and may be an alien experiment from lilo & stitch or a generation 3 pokémon species.", "caption_llm_4": "A solo alien character with ambiguous gender, displaying a smile. the creature has brown eyes, a red nose, and a tan body. set against a simple white background, the alien is depicted as an experiment from lilo & stitch and is also part of generation 3 pokémon species.", "caption_llm_5": "A solo alien experiment from lilo and stitch, with an ambiguous gender. the background is simple and white. the character is smiling, and it's a generation 3 pokémon hybrid species.", "caption_llm_6": "A solo alien experiment from lilo and stitch, with an ambiguous gender. the background is simple and white. the character is smiling, and it's a generation 3 pokémon hybrid species.", "caption_llm_7": "A solo alien experiment, likely from the lilo & stitch universe, depicted as a hybrid pokémon from generation 3. the creature is shown with an ambiguous gender and has a cheerful smile on its face.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"transparent_background\", \"food\", \"dialogue\", \"countershading\"], \"characters_and_gender\": [\"ambiguous_gender\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"feral\", \"scales\", \"toony\", \"tongue\"], \"clothing_and_accessories\": [\"fur\", \"clothing\"], \"actions_and_poses\": [\"front_view\", \"looking_at_viewer\", \"standing\"], \"body_and_body_parts\": [\"spots\", \"teeth\", \"tuft\", \"muscular\", \"markings\", \"bodily_fluids\", \"feet\", \"toes\", \"breasts\", \"fingers\", \"glistening\", \"huge_deltoids\", \"big_deltoids\"], \"emotions_and_expressions\": [\"open_mouth\", \"smile\", \"blush\"], \"miscellaneous\": [\"text\"], \"hairstyle\": [\"hair\"], \"colors\": [\"brown_body\", \"yellow_body\", \"tan_body\", \"green_body\", \"black_eyes\"], \"species_or_animal_type\": [\"generation_3_pokemon\", \"mammal\", \"pokemon_(species)\", \"alien\", \"experiment_(lilo_and_stitch)\", \"hybrid\"]}", "tags_ground_truth_categorized": "{\"characters_and_gender\": [\"ambiguous_gender\"], \"colors\": [\"brown_eyes\", \"red_nose\", \"tan_body\"], \"miscellaneous\": [\"crossover\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"emotions_and_expressions\": [\"smile\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"alien\", \"experiment_(lilo_and_stitch)\", \"generation_3_pokemon\", \"hybrid\", \"pokemon_(species)\"]}", "tags_ground_truth_expanded": ["red_nose", "simple_background", "smile", "solo", "tan_body", "white_background"]}
6
+ {"id": 1325009, "md5": "929bdced281bd135ed1ca76df24332d4", "caption_cogvlm": "The image showcases an anthropomorphic tiger with striking blue eyes. He is depicted in a muscular and confident pose, with one hand raised to his head in a thoughtful or playful gesture. The tiger has a white chest with a tuft of fur, and his fur is striped in the traditional tiger pattern. He is wearing dark blue shorts, and his muscular physique is accentuated by the lighting in the background, which creates a countershading effect. The overall mood of the image is one of confidence and playfulness.", "caption_llm_0": "a muscular anthropomorphic male tiger with stripes, humanoid hands, and claws. He has a chest tuft and a striped body. The background is simple with countershading, white and orange colors are present on his furry body. His eyes are green while his nose is black. He's standing in an outside setting near food, possibly hunting or observing it from afar.", "caption_llm_1": "artwork of tiger dancer (zootopia). a muscular anthropomorphic male tiger, with stripes and humanoid hands. He has claws, a chest tuft, and a striped body. The tiger is bipedal and has a tail. His fur is white with orange stripes, while his body is brown. He wears clothing that covers his top half but leaves his bottom half exposed. The tiger has abs, pecs, biceps, big muscles on his arms and chest area as well as overweight belly which makes him look more muscular than the average anthro tiger in this style of artwork. His fingers are visible due to the humanoid hands feature he possesses along with claws on each finger tip for better grip or attack purposes if needed in the scene depicted in this artwork piece .He also has navel showing through the clothing he's wearing which adds to its realism factor making it look like an actual person rather than just an animal character .The text might be present somewhere within or around this image possibly indicating some sort of context or storyline associated with it but not much can be said about that without further information about what exactly it says .", "caption_llm_2": "a solo male muscular character, depicted in fur clothing and bottomwear. The background is simple with countershading, featuring a white background and elements like food, sky, and clouds. The character has a white body with orange fur accents on its face and tail. Its nose is pink while its eyes are blue. It's shown smiling with one eye closed as it looks at the viewer from a front view while sitting or lying down on the ground.", "caption_llm_3": "artwork of tiger dancer (zootopia). a muscular male anthro tiger, with stripes and humanoid hands. He has claws, a chest tuft, and a striped body. The tiger is depicted in a solo scene wearing clothing that covers his top half while exposing his belly. His fur color is white with orange stripes on the body and brown fur on the face. He has blue eyes, black nose, pink nose markings on his cheeks, and green eyes as well. The tiger's pose includes him standing or sitting in various positions such as looking at the viewer or winking while showing off his muscular physique including pecs, biceps, abs, overweight belly area along with fingers and navel details visible through the clothing.", "caption_llm_4": "A solo, muscular male pantherine tiger with blue eyes. the tiger is depicted topless and clothed in fur shorts. he has a tuft on his head and is smiling while looking at the viewer with one hand on his head. the overall color scheme of the image is dominated by the blue eyes of the tiger, set against a background that may or may not be present in this description.", "caption_llm_5": "artwork of tiger dancer (zootopia). a solo, muscular male pantherine tiger with blue eyes and a tuft of chest hair. He is clothed in fur shorts and topless, with his hand on his head as he looks directly at the viewer.", "caption_llm_6": "a solo, muscular male tiger with blue eyes, wearing clothing and fur shorts. The tiger has chest tufts and stripes, displaying a pantherine appearance. It stands on its hind legs at the countershaded background, smiling confidently.", "caption_llm_7": "artwork of tiger dancer (zootopia). a solo, muscular male anthro tiger with a chest tuft and stripes, standing on its hind legs with one hand on its head and the other holding a blue-eyed pantherine felid. The background features countershading to create depth. The tiger has a smile on its face as it looks directly at the viewer.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"male\", \"muscular_male\", \"overweight_male\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"muscular_anthro\", \"stripes\", \"feral\", \"humanoid_hands\", \"claws\", \"chest_tuft\", \"striped_body\", \"biped\", \"striped_fur\", \"tail\", \"overweight_anthro\"], \"number_of_characters\": [\"solo\"], \"clothing_and_accessories\": [\"fur\", \"clothing\", \"topless\", \"clothed\", \"kemono\", \"bottomwear\"], \"body_and_body_parts\": [\"muscular\", \"pecs\", \"biceps\", \"tuft\", \"abs\", \"overweight\", \"belly\", \"fingers\", \"navel\", \"big_muscles\", \"breasts\", \"teeth\", \"5_fingers\", \"nipples\", \"markings\", \"toes\", \"feet\", \"moobs\", \"eyebrows\", \"young\"], \"background_and_setting\": [\"simple_background\", \"countershading\", \"white_background\", \"outside\", \"food\", \"sky\", \"cloud\"], \"colors\": [\"white_body\", \"orange_body\", \"white_fur\", \"orange_fur\", \"brown_fur\", \"brown_body\", \"black_body\", \"pink_nose\", \"blue_eyes\", \"black_nose\", \"green_eyes\", \"black_fur\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"blush\", \"grin\"], \"actions_and_poses\": [\"looking_at_viewer\", \"standing\", \"one_eye_closed\", \"lying\", \"wink\", \"sitting\", \"eyes_closed\", \"front_view\", \"pose\"], \"hairstyle\": [\"hair\", \"white_hair\", \"black_hair\", \"short_hair\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"tiger\", \"mammal\", \"canid\", \"pantherine\", \"felid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"chest_tuft\", \"muscular_anthro\", \"stripes\"], \"colors\": [\"blue_eyes\"], \"clothing_and_accessories\": [\"bottomwear\", \"clothed\", \"clothing\", \"fur\", \"shorts\", \"topless\"], \"background_and_setting\": [\"countershading\"], \"actions_and_poses\": [\"hand_on_head\", \"looking_at_viewer\"], \"characters_and_gender\": [\"male\", \"muscular_male\"], \"body_and_body_parts\": [\"muscular\", \"tuft\"], \"emotions_and_expressions\": [\"smile\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"felid\", \"mammal\", \"pantherine\", \"tiger\"]}", "tags_ground_truth_expanded": ["anthro", "blue_eyes", "bottomwear", "chest_tuft", "clothed", "clothing", "countershading", "felid", "fur", "hand_on_head", "male", "mammal", "muscular", "muscular_anthro", "muscular_male", "pantherine", "shorts", "solo", "stripes", "tiger", "topless", "tuft"]}
7
+ {"id": 1023509, "md5": "04151411520ae750887cbd79cda9239d", "caption_cogvlm": "The image is a multi-panel comic strip. The first panel shows a character lying on the ground, surrounded by darkness, with a speech bubble saying 'I'm done for...'. The next panel depicts a hooded figure standing over the character, with a speech bubble saying 'You're not done for, you're just beginning.'. The following panels show a conversation between the hooded figure and another character, where the hooded figure mentions 'I'm the guardian of the realm of darkness'. The dialogue continues with the hooded figure expressing that the character has been chosen for a task. The final panels depict a group of characters, including a white-furred creature, a goat, a human, and a lizard, discussing a plan to 'defeat the darkness'. The comic ends with a textual note saying 'there is light'.", "caption_llm_0": "a duo of anthropomorphic goats, one male and one female, wearing clothing with fur. The male goat is clothed in a shirt and headwear, while the female goat is clothed in topwear and headgear. They are holding melee weapons and have closed eyes. Their bodies are white with red eyes, while their fur is also white.", "caption_llm_1": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a duo of anthropomorphic goats, one male and one female, standing in a simple background. The male goat has white fur and red eyes, while the female goat has brown hair. They are both clothed in shirts and headwear, with the male wearing a hat. The scene takes place outside during rainy weather.", "caption_llm_2": "a duo of caprine goats, one with white fur and red eyes, standing outside in the rain. They are holding melee weapons and have a simple background. The male goat has brown hair while the female has blonde hair. Both animals are young and have glowing eyes.", "caption_llm_3": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a duo of anthropomorphic goats, one male and one female, standing in a simple background. The male goat is wearing a shirt and hat while the female goat has brown hair. They are engaged in dialogue with each other, possibly discussing something humorous as they both have open mouths and are smiling. Their bodies are covered in white fur while their eyes have red irises.", "caption_llm_4": "a humanoid goat-like creature with red eyes and a white body, standing in front of a dialogue background. The creature is holding two dice in its armless body. A text label is present, but its content is unknown.", "caption_llm_5": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a caprine goat with long ears, human-like features, and red eyes. It is standing on a white body with white fur. The background shows a dialogue taking place. The animal is wearing fur clothing and holding dice in its armless body. A text label can be seen in the image, but its content is not specified.", "caption_llm_6": "A humanoid goat with long ears, wearing fur clothing. the scene is set against a background of dialogue. a lizard and a mammal are also present, along with a caprine creature resembling a boss monster. the characters engage in an interaction while holding dice and text labels are visible nearby.", "caption_llm_7": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a humanoid goat with long ears, red eyes, and white fur. It is set against a dialogue background. The character wears fur clothing and has no arms. A text label is present in the scene, but its content is not specified.", "tags_synthetic_categorized": "{\"miscellaneous\": [\"text\", \"speech_bubble\", \"profanity\"], \"clothing_and_accessories\": [\"clothing\", \"fur\", \"topwear\", \"clothed\", \"headgear\", \"shirt\", \"headwear\", \"hat\"], \"characters_and_gender\": [\"male\", \"female\", \"ambiguous_gender\"], \"background_and_setting\": [\"dialogue\", \"raining\", \"outside\", \"food\", \"snow\", \"simple_background\", \"inside\", \"tree\"], \"furniture_and_objects\": [\"weapon\", \"melee_weapon\", \"armor\", \"furniture\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"not_furry\", \"horn\"], \"hairstyle\": [\"hair\", \"brown_hair\", \"blonde_hair\", \"black_hair\", \"white_hair\"], \"emotions_and_expressions\": [\"open_mouth\", \"smile\", \"humor\", \"blush\", \"tears\", \"angry\", \"crying\"], \"colors\": [\"white_body\", \"white_fur\", \"red_eyes\"], \"number_of_characters\": [\"duo\", \"group\", \"solo\"], \"body_and_body_parts\": [\"teeth\", \"bodily_fluids\", \"young\", \"bone\", \"breasts\"], \"actions_and_poses\": [\"holding_weapon\", \"holding_object\", \"eyes_closed\", \"glowing\"], \"species_or_animal_type\": [\"human\", \"goat\", \"mammal\", \"caprine\", \"bovid\", \"lagomorph\", \"scalie\", \"boss_monster\", \"reptile\"]}", "tags_ground_truth_categorized": "{\"body_and_body_parts\": [\"armless\"], \"furniture_and_objects\": [\"d6\", \"dice\"], \"background_and_setting\": [\"dialogue\"], \"clothing_and_accessories\": [\"fur\"], \"animals_and_anthropomorphic_features\": [\"long_ears\"], \"colors\": [\"red_eyes\", \"white_body\", \"white_fur\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"boss_monster\", \"bovid\", \"caprine\", \"goat\", \"human\", \"lizard\", \"mammal\", \"reptile\", \"scalie\"]}", "tags_ground_truth_expanded": ["bovid", "caprine", "dialogue", "fur", "goat", "human", "lizard", "mammal", "reptile", "scalie", "text", "white_body", "white_fur"]}
8
+ {"id": 335343, "md5": "780f7c3acd520cc2ce5bbb8a91e99937", "caption_cogvlm": "The image showcases two animated characters lying on a bed, seemingly in a resting state. The character on the left has blonde hair, green eyes, and is wearing makeup, with a slightly annoyed or disgruntled expression. The character on the right has purple hair, blue eyes, and a more relaxed or sleeping expression. Between them, there's a text that reads 'Look Before You Sleep', written in a playful font. The image also has a watermark at the bottom left corner that says 'SkyPony'. The overall color palette is dominated by shades of blue and purple, creating a serene and calming ambiance.", "caption_llm_0": "a solo female earth pony with rainbow hair, lying down on a bed with blue body and white fur. The background is simple, featuring snow and nighttime. The pony has blue eyes and is smiling while blushing. A plant can be seen in the detailed background.", "caption_llm_1": "artwork of applejack (mlp) and rarity (mlp). a solo female earth pony with a blue body, blue feathers, and blonde hair. She has long eyelashes and is smiling. The background features snowy surroundings with trees in the distance.", "caption_llm_2": "a solo female earth pony with purple hair, rainbow eyes, and a cutie mark on her flank. She is wearing clothing and has fur covering her body. The background is simple with snow falling outside during the nighttime. The pony is lying down, eyes closed, and smiling while blushing slightly.", "caption_llm_3": "artwork of applejack (mlp) and rarity (mlp). a solo female character with long, rainbow-colored hair and eyelashes. She is lying on a bed, wearing clothing and fur. The animal type is an earth pony or horse with blue body and white fur, along with blue feathers. The scene includes furniture such as a bed, pillow, bedding, and plant. The character has blush on her cheeks while smiling open-mouthed in her eyes-closed position.", "caption_llm_4": "A pair of earth ponies, one with blonde hair and the other with purple hair. they are lying down, eyes closed, and sleeping peacefully. the ponies have green eyes and white bodies or fur. one of them has freckles on their face. they are wearing makeup and eyeshadow, as well as a text label that is not specified in the description.", "caption_llm_5": "Artwork of applejack (mlp) and rarity (mlp). a duo of female earth ponies, both with white bodies and white fur. they have green eyes and freckles on their faces. one pony has a unicorn horn, while the other has feathers adorning its body. the ponies are lying down on a bed, resting with their eyes closed. a pillow is also present in the scene. the ponies wear makeup, including eyeshadow to enhance their appearance.", "caption_llm_6": "A duo of female earth ponies, one with blonde hair and the other with purple hair. they are both adorned with makeup, including eyeshadow. the ponies have white bodies and fur, as well as green eyes. one pony has freckles on its face. they are lying on a bed surrounded by furniture and pillows while displaying expressions of anger and fear. a text label is also present in the image.", "caption_llm_7": "Artwork of applejack (mlp) and rarity (mlp). a duo of female characters, one with blonde hair and the other with purple hair. they are lying on a bed, which is adorned with furniture and pillows. the characters have freckles on their faces and are wearing eyeshadow, fur clothing, and makeup. one character has green eyes while the other has white fur covering their body. \nthe scene also includes an earth pony, a unicorn, and feral animals such as horns present in the artwork. the characters appear to be sleeping or resting peacefully in this furry artwork style setting.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"female\"], \"animals_and_anthropomorphic_features\": [\"feral\", \"horn\", \"wings\", \"cutie_mark\", \"feathered_wings\", \"feathers\", \"anthro\"], \"miscellaneous\": [\"text\", \"magic\"], \"number_of_characters\": [\"solo\"], \"hairstyle\": [\"hair\", \"purple_hair\", \"two_tone_hair\", \"rainbow_hair\", \"blue_hair\", \"blonde_hair\", \"long_hair\", \"pink_hair\", \"eyelashes\"], \"actions_and_poses\": [\"eyes_closed\", \"water\", \"sleeping\", \"lying\", \"looking_at_viewer\"], \"clothing_and_accessories\": [\"fur\", \"clothing\"], \"background_and_setting\": [\"snow\", \"outside\", \"simple_background\", \"food\", \"night\", \"underwater\", \"dialogue\", \"snowing\", \"moon\", \"winter\", \"holidays\", \"sky\", \"inside\", \"detailed_background\", \"tree\"], \"colors\": [\"blue_body\", \"blue_feathers\", \"blue_fur\", \"white_body\", \"white_fur\", \"blue_eyes\"], \"furniture_and_objects\": [\"furniture\", \"bed\", \"pillow\", \"bedding\", \"plant\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"blush\"], \"body_and_body_parts\": [\"young\", \"breasts\", \"teeth\", \"bodily_fluids\"], \"species_or_animal_type\": [\"pony\", \"earth_pony\", \"mammal\", \"marine\", \"horse\", \"equid\", \"unicorn\", \"equine\"]}", "tags_ground_truth_categorized": "{\"emotions_and_expressions\": [\"angry\", \"scared\"], \"furniture_and_objects\": [\"bed\", \"furniture\", \"pillow\"], \"hairstyle\": [\"blonde_hair\", \"hair\", \"purple_hair\"], \"number_of_characters\": [\"duo\"], \"actions_and_poses\": [\"eyes_closed\", \"lying\", \"sleeping\"], \"clothing_and_accessories\": [\"eyeshadow\", \"fur\", \"makeup\"], \"animals_and_anthropomorphic_features\": [\"feathers\", \"feral\", \"horn\"], \"characters_and_gender\": [\"female\"], \"body_and_body_parts\": [\"freckles\"], \"colors\": [\"green_eyes\", \"white_body\", \"white_fur\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"earth_pony\", \"equid\", \"equine\", \"horse\", \"mammal\", \"pony\", \"unicorn\"]}", "tags_ground_truth_expanded": ["angry", "bed", "blonde_hair", "blue_eyes", "duo", "eyes_closed", "eyeshadow", "green_eyes", "hair", "lying", "makeup", "purple_hair", "sleeping", "text"]}
9
+ {"id": 17482, "md5": "4f41d96bf3912080e56aec95973baee3", "caption_cogvlm": "The image showcases an anthropomorphic creature, possibly a wolf or a dog, with a spade tail and claws, playing a bass guitar. The creature is depicted in a dynamic pose, with its hair flowing and fingers poised on the guitar strings. The background is a blend of pastel colors, giving the artwork a dreamy and ethereal feel. The creature's attire appears torn, and it holds the guitar with a sense of passion and dedication.", "caption_llm_0": "a bipedal, anthropomorphic canine with claws and a tail, wearing clothing made of fur. The character is depicted playing the electric guitar while standing with eyes closed, holding the instrument in its membranous wings. The background features a smiling demonic figure holding an object.", "caption_llm_1": "A female canid, likely a demon or other mythical creature, clad in fur clothing and wearing pants. she stands with her eyes closed, playing the electric guitar while holding it with her fingers. her feet rest on the simple background as she smiles while performing. the membrane between her fingers allows for better control over the strings of the guitar.", "caption_llm_2": "a solo female canid, likely a demon or other type of canine mammal, playing music. She is clothed in fur and wears bottomwear in the form of pants. The background is simple, and she stands with her eyes closed while holding a musical instrument and an object. Her hair is visible, as are her feet and fingers with their membrane (anatomy).", "caption_llm_3": "A solo female anthropomorphic canine character, depicted as a bipedal creature with claws, tail, and membranous wings. she is wearing clothing made of fur and has bottomwear in the form of pants. her feet have toes that are covered by membranes. the background is simple, and she has a smile on her face while playing music.", "caption_llm_4": "A solo female canid character, likely a demon or other mythical creature, holding a plucked string instrument (such as a bass guitar or guitar) while displaying anthro features like 4 fingers and claws. the character has hair and is posed with the instrument in her hands. the scene is set against the backdrop of music, with various musical elements present.", "caption_llm_5": "A slender, anthropomorphic canine with four fingers and claws, holding a bass guitar. the canine has long hair and is depicted in a solo pose while playing the musical instrument.", "caption_llm_6": "A slender, anthropomorphic canine with four fingers and claws, holding a bass guitar. the canine has a spade-like tail and is depicted as female. the background consists of musical elements such as plucked string instruments and other string instruments.", "caption_llm_7": "A solo female canid character, likely a demon or other mythical creature, holding a bass guitar and plucked string instrument. she is clothed in torn clothing made of fur, with slim fingers. the scene depicts her playing music while holding the musical instruments in an intense pose.", "tags_synthetic_categorized": "{\"furniture_and_objects\": [\"musical_instrument\", \"string_instrument\", \"plucked_string_instrument\", \"guitar\", \"electric_guitar\", \"bass_guitar\"], \"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"claws\", \"biped\", \"tail\", \"membranous_wings\", \"wings\"], \"characters_and_gender\": [\"male\"], \"clothing_and_accessories\": [\"clothing\", \"fur\", \"clothed\", \"bottomwear\", \"pants\"], \"actions_and_poses\": [\"playing_music\", \"playing_guitar\", \"holding_musical_instrument\", \"standing\", \"holding_object\", \"eyes_closed\"], \"hairstyle\": [\"hair\"], \"body_and_body_parts\": [\"feet\", \"fingers\", \"toes\", \"membrane_(anatomy)\", \"5_fingers\", \"tuft\"], \"miscellaneous\": [\"music\"], \"background_and_setting\": [\"simple_background\"], \"emotions_and_expressions\": [\"smile\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"mammal\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"4_fingers\", \"anthro\", \"claws\", \"spade_tail\"], \"furniture_and_objects\": [\"bass_guitar\", \"guitar\", \"musical_instrument\", \"plucked_string_instrument\", \"string_instrument\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\", \"torn_clothing\"], \"characters_and_gender\": [\"female\"], \"body_and_body_parts\": [\"fingers\", \"slim\"], \"hairstyle\": [\"hair\"], \"actions_and_poses\": [\"holding_musical_instrument\", \"holding_object\"], \"miscellaneous\": [\"music\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"demon\", \"mammal\"]}", "tags_ground_truth_expanded": ["anthro", "bass_guitar", "canid", "canine", "claws", "clothed", "clothing", "fingers", "fur", "guitar", "hair", "holding_musical_instrument", "holding_object", "mammal", "music", "musical_instrument", "plucked_string_instrument", "solo", "spade_tail", "string_instrument", "tail", "torn_clothing"]}
10
+ {"id": 2021552, "md5": "795c62b5d529f3758fe8f7de6062d2e6", "caption_cogvlm": "The image showcases two anthropomorphic characters. On the left is a rabbit-like creature dressed in a white shirt and black pants, standing with crossed arms. On the right is a fox-like character wearing blue overalls and a white shirt, looking towards the rabbit with a slightly open mouth. The background is a simple grey, and both characters have distinct features such as fur, facial markings, and claws.", "caption_llm_0": "a solo male character, clothed in a shirt and overalls, standing with crossed arms. He has white fur and grey body markings, with blue eyes. The background is simple, featuring either white or grey tones. The character is an arctic fox or canid species, displaying a half-closed eye expression while making eye contact with the viewer.", "caption_llm_1": "artwork of jack savage and skye (zootopia). a solo male character, clothed in a shirt and pants, standing with crossed arms. He has white fur and a grey body, with blue eyes. The background is simple, featuring either white or grey tones. The character is an anthropomorphic arctic fox with facial markings such as cheek tufts and head markings. He has fluffy tail tufts and pawpads on his feet. His ears are dipstick-shaped, and he has claws on his hands and feet. The character's tail is fluffy as well, with some tail markings present. In the scene, he appears to be looking at another person or object while holding something in his hand or pocket.", "caption_llm_2": "a solo male character, an anthropomorphic arctic fox with facial markings and a fluffy tail. The background is simple, featuring white and grey hues. The fox is standing with crossed arms, looking at another character while holding an object. It has blue eyes and grey fur on its body.", "caption_llm_3": "artwork of jack savage and skye (zootopia). a solo male character, standing with crossed arms and looking at another. He is fully clothed in a shirt and pants, with furry pawpads on his feet. The background is simple, white or grey. The character has a fluffy tail and facial tufts, as well as dipstick ears and head markings. He is holding an object while smiling with open mouth and half-closed eyes, making eye contact with the viewer. His species is an arctic fox or rabbit within the canid family.", "caption_llm_4": "Two anthropomorphic animals, one with a grey body and white fur, the other with a white body and grey fur. both have fluffy tails, cheek tufts, head markings, and facial tufts. they are clothed in overalls made of fur and wear shirts. one has crossed arms while the other looks away from another character. the background is simple with a grey color scheme.", "caption_llm_5": "Artwork of jack savage and skye (zootopia). a pair of anthropomorphic animals, one with a fluffy tail and cheek tufts, standing against a simple grey background. the other animal has facial markings and head tufts. both creatures have pawpads and toe claws, as well as fluffy fur in various shades of grey or white. they are depicted in crossed arms poses, looking away from each other while standing on their hind legs.", "caption_llm_6": "A duo of anthropomorphic animals, one an arctic fox and the other a rabbit. both are clothed in simple outfits - the fox in overalls and the rabbit in pants and a shirt. they stand with crossed arms, looking away from each other against a grey background. the fox has fluffy fur, cheek tufts, head markings, facial tufts, pawpads, claws on its toes and tail markings. the rabbit also has fluffy fur with head tufts and toe claws.", "caption_llm_7": "Artwork of jack savage and skye (zootopia). a pair of anthropomorphic animals, one with a fluffy tail and cheek tufts, standing against a simple grey background. the other animal has facial markings and head tufts. both creatures have pawpads and toe claws, as well as fluffy fur in various shades of grey or white. they are depicted in crossed arms poses, looking away from each other while standing on their hind legs.", "tags_synthetic_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"facial_markings\", \"cheek_tuft\", \"facial_tuft\", \"biped\", \"head_markings\", \"tail\", \"fluffy_tail\", \"fluffy\", \"head_tuft\", \"claws\", \"dipstick_ears\", \"size_difference\", \"dipstick_tail\", \"pawpads\", \"3_toes\", \"tail_markings\", \"neck_tuft\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"barefoot\", \"topwear\", \"shirt\", \"pants\", \"fully_clothed\", \"bottomwear\", \"overalls\", \"dress\"], \"number_of_characters\": [\"solo\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"grey_background\"], \"body_and_body_parts\": [\"feet\", \"tuft\", \"markings\", \"toes\", \"ear_markings\", \"toe_claws\", \"teeth\", \"butt\", \"breasts\", \"fingers\"], \"actions_and_poses\": [\"standing\", \"crossed_arms\", \"looking_at_another\", \"holding_object\", \"looking_at_viewer\", \"hand_on_hip\", \"hand_in_pocket\", \"sitting\", \"looking_back\", \"side_view\", \"pose\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"narrowed_eyes\", \"half-closed_eyes\", \"eye_contact\"], \"colors\": [\"white_fur\", \"white_body\", \"blue_eyes\", \"grey_body\", \"grey_fur\"], \"miscellaneous\": [\"text\"], \"hairstyle\": [\"hair\"], \"species_or_animal_type\": [\"mammal\", \"fox\", \"rabbit\", \"lagomorph\", \"canine\", \"arctic_fox\", \"felid\", \"leporid\", \"canid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"cheek_tuft\", \"claws\", \"facial_markings\", \"facial_tuft\", \"fluffy\", \"fluffy_tail\", \"head_markings\", \"head_tuft\", \"pawpads\", \"tail\"], \"clothing_and_accessories\": [\"bottomwear\", \"clothed\", \"clothing\", \"fur\", \"overalls\", \"pants\", \"shirt\", \"topwear\"], \"actions_and_poses\": [\"crossed_arms\", \"looking_at_another\", \"looking_away\", \"standing\"], \"number_of_characters\": [\"duo\"], \"background_and_setting\": [\"grey_background\", \"simple_background\"], \"colors\": [\"grey_body\", \"grey_fur\", \"white_body\", \"white_fur\"], \"body_and_body_parts\": [\"markings\", \"toe_claws\", \"tuft\"], \"species_or_animal_type\": [\"arctic_fox\", \"canid\", \"canine\", \"fox\", \"lagomorph\", \"leporid\", \"mammal\", \"rabbit\"]}", "tags_ground_truth_expanded": ["anthro", "bottomwear", "canid", "canine", "claws", "clothed", "clothing", "crossed_arms", "duo", "facial_markings", "fox", "fur", "grey_background", "lagomorph", "leporid", "looking_at_another", "mammal", "overalls", "pants", "rabbit", "shirt", "simple_background", "standing", "topwear"]}
11
+ {"id": 2034167, "md5": "00dc19bfce60793cedbd2bbcd8e01c44", "caption_cogvlm": "The image showcases a vibrant, animated character that appears to be a fusion of a canine and a humanoid. The character has striking blue eyes, a blue nose, and a purple body with white fur. The character's fur is adorned with vivid pink and blue stripes, and it has a playful, open-mouthed expression. The character's tail is long and curved, with a mix of blue and pink hues. The background is simple, allowing the character to be the focal point. The character appears to be in a dynamic pose, possibly mid-stride or jump.", "caption_llm_0": "a solo wolf character with ambiguous gender, standing and looking at the viewer. The wolf has white fur on its body and tail, while its face is covered in purple fur. Its eyes are blue, and it has red hair on its ears. The background is simple and white.", "caption_llm_1": "A solo feral wolf with paws, tongue, and stripes on its body. the wolf has 4 toes and is shown in a quadruped pose. its fur is white, while its body is red or orange. the animal has either purple or blue hair with eyelashes and red eyes. it's smiling with an open mouth and tongue out against a simple white background.", "caption_llm_2": "A solo character with ambiguous gender, standing and looking at the viewer. the character has fur clothing and accessories, along with various hairstyles such as purple, blue, red hair or eyelashes. the background is simple and white. \nthe character has a white body with either white fur or red fur depending on the species depicted - wolf, mammal, canid or canine (canis). they have either blue eyes or orange body/fur coloration. their feet show toes and they display an open mouth expression with tongue out pose.", "caption_llm_3": "A solo feral wolf with paws, tongue, and stripes. the wolf has 4 toes and a striped body with fur in various colors such as white, blue eyes, red body, purple fur, orange body and fur. it also has teeth and is standing while looking at the viewer. the wolf's hair is in different shades of purple or blue with eyelashes.", "caption_llm_4": "A solo female canine, likely a wolf, with 4 toes and feral characteristics. the background is simple, allowing the focus to be on the animal's open mouth and teeth. the artwork style is furry, with attention to detail in the animal's fur and expressions.", "caption_llm_5": "A solo, female canine with 4 toes and a purple body. the wolf has white fur and blue eyes, as well as a blue nose. she is depicted in a simple background with her fur serving as clothing or an accessory. her tongue is visible, and she has feet and teeth.", "caption_llm_6": "A solo female canine, likely a wolf, in a simple background. the animal is depicted with its mouth open and fur covering its body and accessories. its feet and teeth are also visible, as well as her toes.", "caption_llm_7": "A solo, female canine with a purple body and fur, white fur on her face and body, blue eyes, and a blue nose. she has 4 toes on each foot and displays an open-mouth expression. the background is simple. the character wears fur as clothing or accessories.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"feral\", \"paws\", \"tongue\", \"4_toes\", \"stripes\", \"striped_body\", \"striped_fur\", \"quadruped\"], \"clothing_and_accessories\": [\"fur\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"hairstyle\": [\"hair\", \"purple_hair\", \"eyelashes\", \"blue_hair\", \"red_hair\"], \"colors\": [\"white_fur\", \"white_body\", \"blue_eyes\", \"red_body\", \"red_fur\", \"purple_fur\", \"orange_body\", \"orange_fur\", \"purple_body\"], \"body_and_body_parts\": [\"toes\", \"feet\", \"teeth\", \"fingers\", \"ear_piercing\", \"eyebrows\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"tongue_out\"], \"characters_and_gender\": [\"ambiguous_gender\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\"], \"species_or_animal_type\": [\"mammal\", \"wolf\", \"canis\", \"canine\", \"canid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"4_toes\", \"feral\", \"tongue\"], \"colors\": [\"blue_eyes\", \"blue_nose\", \"purple_body\", \"purple_fur\", \"white_body\", \"white_fur\"], \"body_and_body_parts\": [\"feet\", \"teeth\", \"toes\"], \"characters_and_gender\": [\"female\"], \"clothing_and_accessories\": [\"fur\"], \"emotions_and_expressions\": [\"open_mouth\"], \"background_and_setting\": [\"simple_background\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"canis\", \"mammal\", \"wolf\"]}", "tags_ground_truth_expanded": ["blue_eyes", "blue_nose", "canid", "canine", "fur", "mammal", "open_mouth", "purple_body", "simple_background", "solo", "white_body", "white_fur"]}
psq_rag/retrieval/psq_retrieval.py CHANGED
@@ -164,6 +164,7 @@ def psq_candidates_from_rewrite_phrases(
164
  per_phrase_k: int = 50,
165
  per_phrase_final_k: int = 10,
166
  global_k: int = 300,
 
167
  verbose: bool = False,
168
  ) -> Union[List[Candidate], Tuple[List[Candidate], List[Dict[str, Any]]]]:
169
  head_stopwords = {
@@ -249,6 +250,7 @@ def psq_candidates_from_rewrite_phrases(
249
  phrase_best_tokens: Dict[str, Dict[str, str]] = {}
250
  phrase_context_imputed: Dict[str, Dict[str, bool]] = {}
251
  phrase_reports: List[Dict[str, Any]] = []
 
252
 
253
  for phrase in final_phrases:
254
  lookup = phrase.replace(" ", "_")
@@ -414,6 +416,11 @@ def psq_candidates_from_rewrite_phrases(
414
  scored_rows = scored_rows[:per_phrase_final_k]
415
  per_phrase_scored[phrase] = scored_rows
416
  phrase_context_imputed[phrase] = context_imputed_by_tag
 
 
 
 
 
417
 
418
  for tag, score_fasttext, score_context, score_combined in scored_rows:
419
  existing = merged_by_tag.get(tag)
@@ -475,6 +482,10 @@ def psq_candidates_from_rewrite_phrases(
475
  merged_candidates.sort(key=lambda c: c.score_combined, reverse=True)
476
  merged_candidates = merged_candidates[:global_k]
477
 
 
 
 
 
478
  return (merged_candidates, phrase_reports) if verbose else merged_candidates
479
 
480
 
 
164
  per_phrase_k: int = 50,
165
  per_phrase_final_k: int = 10,
166
  global_k: int = 300,
167
+ return_phrase_ranks: bool = False,
168
  verbose: bool = False,
169
  ) -> Union[List[Candidate], Tuple[List[Candidate], List[Dict[str, Any]]]]:
170
  head_stopwords = {
 
250
  phrase_best_tokens: Dict[str, Dict[str, str]] = {}
251
  phrase_context_imputed: Dict[str, Dict[str, bool]] = {}
252
  phrase_reports: List[Dict[str, Any]] = []
253
+ phrase_rank_by_tag: Dict[str, int] = {}
254
 
255
  for phrase in final_phrases:
256
  lookup = phrase.replace(" ", "_")
 
416
  scored_rows = scored_rows[:per_phrase_final_k]
417
  per_phrase_scored[phrase] = scored_rows
418
  phrase_context_imputed[phrase] = context_imputed_by_tag
419
+ if return_phrase_ranks:
420
+ for rank, (tag, _score_fasttext, _score_context, _score_combined) in enumerate(scored_rows, start=1):
421
+ prev = phrase_rank_by_tag.get(tag)
422
+ if prev is None or rank < prev:
423
+ phrase_rank_by_tag[tag] = rank
424
 
425
  for tag, score_fasttext, score_context, score_combined in scored_rows:
426
  existing = merged_by_tag.get(tag)
 
482
  merged_candidates.sort(key=lambda c: c.score_combined, reverse=True)
483
  merged_candidates = merged_candidates[:global_k]
484
 
485
+ if return_phrase_ranks:
486
+ if verbose:
487
+ return (merged_candidates, phrase_reports, phrase_rank_by_tag)
488
+ return (merged_candidates, phrase_rank_by_tag)
489
  return (merged_candidates, phrase_reports) if verbose else merged_candidates
490
 
491
 
psq_rag/tagging/categorized_suggestions.py CHANGED
@@ -37,21 +37,27 @@ class CategorizedTagSuggestions:
37
  categories: Dict[str, TagCategory] # All category definitions
38
 
39
 
40
- def load_categories(checklist_path: Optional[Path] = None) -> Dict[str, TagCategory]:
41
  """
42
  Load and parse category definitions from checklist.
43
 
44
  Args:
45
  checklist_path: Path to checklist file. If None, uses default location.
46
 
47
- Returns:
48
- Dict mapping category_name -> TagCategory
49
- """
50
- if checklist_path is None:
51
- # Try to find it in the git repo from the other branch
52
- import subprocess
53
- try:
54
- result = subprocess.run(
 
 
 
 
 
 
55
  ['git', 'show', 'origin/claude/prompt-squirrel-rag-3PZn7:tagging_checklist.txt'],
56
  capture_output=True,
57
  text=True,
 
37
  categories: Dict[str, TagCategory] # All category definitions
38
 
39
 
40
+ def load_categories(checklist_path: Optional[Path] = None) -> Dict[str, TagCategory]:
41
  """
42
  Load and parse category definitions from checklist.
43
 
44
  Args:
45
  checklist_path: Path to checklist file. If None, uses default location.
46
 
47
+ Returns:
48
+ Dict mapping category_name -> TagCategory
49
+ """
50
+ if checklist_path is None:
51
+ repo_root = Path(__file__).resolve().parents[2]
52
+ local_checklist = repo_root / "tagging_checklist.txt"
53
+ if local_checklist.exists():
54
+ checklist_path = local_checklist
55
+
56
+ if checklist_path is None:
57
+ # Try to find it in the git repo from the other branch
58
+ import subprocess
59
+ try:
60
+ result = subprocess.run(
61
  ['git', 'show', 'origin/claude/prompt-squirrel-rag-3PZn7:tagging_checklist.txt'],
62
  capture_output=True,
63
  text=True,
scripts/analyze_caption_evident_audit.py ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Analyze caption-evident tag recall against retrieved tags.
3
+
4
+ Compares tags marked caption-evident to retrieved tags (optionally + structural),
5
+ with optional implication expansion on both sets.
6
+ """
7
+ from __future__ import annotations
8
+
9
+ import argparse
10
+ import json
11
+ from collections import Counter
12
+ from pathlib import Path
13
+ from typing import Dict, Iterable, Set
14
+ import sys
15
+
16
+ _REPO_ROOT = Path(__file__).resolve().parents[1]
17
+ if str(_REPO_ROOT) not in sys.path:
18
+ sys.path.insert(0, str(_REPO_ROOT))
19
+
20
+ from psq_rag.retrieval.state import expand_tags_via_implications
21
+
22
+
23
+ def _load_evident(path: Path) -> Dict[int, Set[str]]:
24
+ by_id: Dict[int, Set[str]] = {}
25
+ with path.open("r", encoding="utf-8") as f:
26
+ for line in f:
27
+ row = json.loads(line)
28
+ sid = row.get("id")
29
+ if sid is None:
30
+ continue
31
+ tags = set(row.get("tags_ground_truth_expanded") or [])
32
+ if tags:
33
+ by_id[int(sid)] = tags
34
+ return by_id
35
+
36
+
37
+ def _load_eval_detail(path: Path) -> Dict[int, dict]:
38
+ rows = {}
39
+ with path.open("r", encoding="utf-8") as f:
40
+ for line in f:
41
+ row = json.loads(line)
42
+ if row.get("_meta"):
43
+ continue
44
+ rows[int(row["sample_id"])] = row
45
+ return rows
46
+
47
+
48
+ def _expand(tags: Iterable[str]) -> Set[str]:
49
+ expanded, _ = expand_tags_via_implications(set(tags))
50
+ return expanded
51
+
52
+
53
+ def main() -> int:
54
+ ap = argparse.ArgumentParser(description="Caption-evident audit vs retrieval.")
55
+ ap.add_argument("--evident", type=str, required=True,
56
+ help="Caption-evident JSONL (tags_ground_truth_expanded set to evident tags).")
57
+ ap.add_argument("--detail", type=str, required=True,
58
+ help="Eval detail JSONL (from eval_pipeline.py).")
59
+ ap.add_argument("--no-structural", action="store_true",
60
+ help="Do not count structural tags as retrieved.")
61
+ ap.add_argument("--expand-implications", action="store_true",
62
+ help="Expand both evident and retrieved tags via implications.")
63
+ args = ap.parse_args()
64
+
65
+ evident_by_id = _load_evident(Path(args.evident))
66
+ detail_by_id = _load_eval_detail(Path(args.detail))
67
+
68
+ hit_counter = Counter()
69
+ miss_counter = Counter()
70
+ present_counter = Counter()
71
+
72
+ print("ID,evident,retrieved,overlap,recall_evident,precision_evident,missing_evident,extra_not_evident,complete_overlap")
73
+
74
+ total_evident = total_retrieved = total_overlap = 0
75
+
76
+ for sid in sorted(evident_by_id):
77
+ ev = set(evident_by_id[sid])
78
+ detail = detail_by_id.get(sid)
79
+ if detail is None:
80
+ continue
81
+ retrieved = set(detail.get("retrieved_tags", []))
82
+ if not args.no_structural:
83
+ retrieved |= set(detail.get("structural_tags", []))
84
+
85
+ if args.expand_implications:
86
+ ev = _expand(ev)
87
+ retrieved = _expand(retrieved)
88
+
89
+ overlap = ev & retrieved
90
+ missing = ev - retrieved
91
+ extra = retrieved - ev
92
+
93
+ for t in ev:
94
+ present_counter[t] += 1
95
+ if t in retrieved:
96
+ hit_counter[t] += 1
97
+ else:
98
+ miss_counter[t] += 1
99
+
100
+ recall = len(overlap) / len(ev) if ev else 0.0
101
+ precision = len(overlap) / len(retrieved) if retrieved else 0.0
102
+ total_evident += len(ev)
103
+ total_retrieved += len(retrieved)
104
+ total_overlap += len(overlap)
105
+ complete = len(missing) == 0
106
+ print(f"{sid},{len(ev)},{len(retrieved)},{len(overlap)},{recall:.3f},{precision:.3f},{len(missing)},{len(extra)},{complete}")
107
+
108
+ print(f"TOTAL,{total_evident},{total_retrieved},{total_overlap},{(total_overlap/total_evident):.3f},{(total_overlap/total_retrieved):.3f},{total_evident-total_overlap},{total_retrieved-total_overlap},N/A")
109
+
110
+ print("\nMOST MISSED (caption-evident tags not retrieved):")
111
+ for tag, cnt in miss_counter.most_common(20):
112
+ present = present_counter[tag]
113
+ print(f" {tag:25s} missed {cnt}/{present} (present {present}/10)")
114
+
115
+ print("\nMOST FOUND (caption-evident tags retrieved):")
116
+ for tag, cnt in hit_counter.most_common(20):
117
+ present = present_counter[tag]
118
+ print(f" {tag:25s} found {cnt}/{present} (present {present}/10)")
119
+
120
+ always_found = [t for t, c in hit_counter.items() if c == present_counter[t]]
121
+ if always_found:
122
+ print("\nALWAYS FOUND WHEN EVIDENT:")
123
+ for t in sorted(always_found):
124
+ print(f" {t}")
125
+
126
+ return 0
127
+
128
+
129
+ if __name__ == "__main__":
130
+ raise SystemExit(main())
scripts/analyze_threshold_grid.py ADDED
@@ -0,0 +1,407 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Analyze post-hoc retrieval score thresholds on Stage 3 selections.
3
+
4
+ This script re-scores evaluation outputs by removing Stage 3 selections
5
+ with retrieval score <= threshold, then recomputing metrics. This is an
6
+ approximation that avoids re-running the LLMs.
7
+ """
8
+ from __future__ import annotations
9
+
10
+ import argparse
11
+ import json
12
+ import sys
13
+ from pathlib import Path
14
+ from typing import Dict, Iterable, List, Set, Tuple
15
+
16
+ _REPO_ROOT = Path(__file__).resolve().parents[1]
17
+ if str(_REPO_ROOT) not in sys.path:
18
+ sys.path.insert(0, str(_REPO_ROOT))
19
+
20
+ import csv
21
+ from collections import defaultdict
22
+
23
+ from psq_rag.retrieval.state import expand_tags_via_implications, get_leaf_tags
24
+ from scripts.eval_pipeline import _EVAL_EXCLUDED_TAGS # reuse eval exclusions
25
+
26
+
27
+ def _compute_metrics(predicted: Set[str], ground_truth: Set[str]) -> Tuple[float, float, float]:
28
+ if not predicted and not ground_truth:
29
+ return 1.0, 1.0, 1.0
30
+ if not predicted:
31
+ return 0.0, 0.0, 0.0
32
+ if not ground_truth:
33
+ return 0.0, 0.0, 0.0
34
+ tp = len(predicted & ground_truth)
35
+ precision = tp / len(predicted)
36
+ recall = tp / len(ground_truth)
37
+ f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0.0
38
+ return precision, recall, f1
39
+
40
+
41
+ def _load_rows(path: Path) -> Tuple[dict, List[dict]]:
42
+ meta = None
43
+ rows = []
44
+ with path.open("r", encoding="utf-8") as f:
45
+ for line in f:
46
+ row = json.loads(line)
47
+ if row.get("_meta"):
48
+ meta = row
49
+ continue
50
+ rows.append(row)
51
+ if meta is None:
52
+ meta = {}
53
+ return meta, rows
54
+
55
+
56
+ def _load_tag_db(repo_root: Path) -> Dict[str, int]:
57
+ tag_type: Dict[str, int] = {}
58
+ db_path = repo_root / "fluffyrock_3m.csv"
59
+ if not db_path.exists():
60
+ return tag_type
61
+ with db_path.open("r", encoding="utf-8") as f:
62
+ for row in csv.reader(f):
63
+ if len(row) < 2:
64
+ continue
65
+ tag = row[0].strip()
66
+ try:
67
+ tid = int(row[1]) if row[1].strip() else -1
68
+ except ValueError:
69
+ tid = -1
70
+ tag_type[tag] = tid
71
+ return tag_type
72
+
73
+
74
+ TYPE_ID_NAMES = {
75
+ 0: "general",
76
+ 1: "artist",
77
+ 3: "copyright",
78
+ 4: "character",
79
+ 5: "species",
80
+ 7: "meta",
81
+ }
82
+
83
+ _TAXONOMY = frozenset({
84
+ "mammal","canid","canine","canis","felid","feline","felis","ursine","cervid","bovid","equid","equine",
85
+ "mustelid","procyonid","reptile","scalie","avian","bird","fish","marine","arthropod","insect","arachnid",
86
+ "amphibian","primate","rodent","lagomorph","leporid","galliform","gallus_(genus)","phasianid","passerine",
87
+ "oscine","dinosaur","theropod","cetacean","pinniped","chiroptera","marsupial","monotreme","mephitid",
88
+ "suid","suina"
89
+ })
90
+ _BODY_PLAN = frozenset({"anthro","feral","biped","quadruped","taur","humanoid","semi-anthro","animatronic","robot","machine","plushie","kemono"})
91
+ _POSE = frozenset({
92
+ "solo","duo","group","trio","standing","sitting","lying","running","walking","flying","swimming","crouching",
93
+ "kneeling","jumping","looking_at_viewer","looking_away","looking_back","looking_up","looking_down",
94
+ "looking_aside","front_view","side_view","back_view","three-quarter_view","from_above","from_below","close-up",
95
+ "portrait","full-length_portrait","hand_on_hip","arms_crossed","all_fours","on_back","on_side","crossed_arms"
96
+ })
97
+
98
+
99
+ def _categorize(tag: str, tag_type: Dict[str, int]) -> str:
100
+ tid = tag_type.get(tag, -1)
101
+ tn = TYPE_ID_NAMES.get(tid, "unknown")
102
+ if tn == "species":
103
+ return "species"
104
+ if tn in ("artist", "copyright", "character", "meta"):
105
+ return tn
106
+ if tag in _TAXONOMY:
107
+ return "taxonomy"
108
+ if tag in _BODY_PLAN:
109
+ return "body_plan"
110
+ if tag in _POSE:
111
+ return "pose/composition"
112
+ if tag.startswith(tuple(str(i) + "_" for i in range(10))) and any(
113
+ tag.endswith(s) for s in ("fingers","toes","horns","arms","legs","eyes","ears","wings","tails")
114
+ ):
115
+ return "count/anatomy"
116
+ if tag in ("male","female","intersex","ambiguous_gender","andromorph","gynomorph"):
117
+ return "gender"
118
+ if any(k in tag for k in (
119
+ "clothing","clothed","topwear","bottomwear","legwear","handwear","headwear","footwear","shirt","pants",
120
+ "shorts","dress","skirt","jacket","coat","hat","boots","shoes","gloves","socks","stockings","belt",
121
+ "collar","scarf","cape","armor","suit","uniform","costume","outfit"
122
+ )):
123
+ return "clothing"
124
+ if any(tag.startswith(c + "_") for c in (
125
+ "red","blue","green","yellow","orange","purple","pink","black","white","grey","gray","brown","tan","cream",
126
+ "gold","silver","teal","cyan","magenta"
127
+ )):
128
+ return "color/marking"
129
+ if tag.endswith("_coloring") or tag.endswith("_markings") or tag == "markings":
130
+ return "color/marking"
131
+ if "hair" in tag:
132
+ return "hair"
133
+ if any(k in tag for k in (
134
+ "muscle","belly","chest","abs","breast","butt","tail","wing","horn","ear","eye","teeth","fang","claw",
135
+ "paw","hoof","snout","muzzle","tongue","fur","scales","feather","tuft","fluff","mane"
136
+ )):
137
+ return "body/anatomy"
138
+ if any(k in tag for k in (
139
+ "smile","grin","frown","expression","blush","angry","happy","sad","crying","laughing","open_mouth",
140
+ "closed_eyes","wink"
141
+ )):
142
+ return "expression"
143
+ return "other_general"
144
+
145
+
146
+ def _iter_thresholds(values: Iterable[float], min_v: float, max_v: float, step: float) -> List[float]:
147
+ if values:
148
+ return sorted(set(values))
149
+ thresholds = []
150
+ v = min_v
151
+ while v <= max_v + 1e-9:
152
+ thresholds.append(round(v, 4))
153
+ v += step
154
+ return thresholds
155
+
156
+
157
+ def _sparkline(values: List[float], width: int = 50) -> str:
158
+ if not values:
159
+ return ""
160
+ charset = " .:-=+*#%@"
161
+ vmin = min(values)
162
+ vmax = max(values)
163
+ if vmax == vmin:
164
+ return charset[0] * min(width, len(values))
165
+ out = []
166
+ for v in values:
167
+ norm = (v - vmin) / (vmax - vmin)
168
+ idx = int(round(norm * (len(charset) - 1)))
169
+ out.append(charset[idx])
170
+ return "".join(out)
171
+
172
+
173
+ def analyze(
174
+ path: Path,
175
+ thresholds: List[float],
176
+ expand_implications: bool,
177
+ category_curves: bool,
178
+ mode: str,
179
+ ) -> Tuple[List[dict], List[dict]]:
180
+ meta, rows = _load_rows(path)
181
+ expand = expand_implications or bool(meta.get("expand_implications"))
182
+ tag_type = _load_tag_db(_REPO_ROOT) if category_curves else {}
183
+
184
+ results = []
185
+ category_rows = []
186
+ for thr in thresholds:
187
+ total_p = total_r = total_f1 = 0.0
188
+ total_lp = total_lr = total_lf1 = 0.0
189
+ total_sel = 0
190
+ total_gt = 0
191
+ total_oracle_r = 0.0
192
+ total_oracle_f1 = 0.0
193
+ n = 0
194
+
195
+ if category_curves:
196
+ cat_totals = defaultdict(lambda: {"p": 0.0, "r": 0.0, "f1": 0.0, "n": 0})
197
+
198
+ for row in rows:
199
+ gt = set(row.get("ground_truth_tags", []))
200
+ gt -= _EVAL_EXCLUDED_TAGS
201
+
202
+ stage3_selected = set(row.get("stage3_selected", []))
203
+ stage3_scores: Dict[str, float] = row.get("stage3_selected_scores", {}) or {}
204
+ stage3_ranks: Dict[str, int] = row.get("stage3_selected_ranks", {}) or {}
205
+ stage3_phrase_ranks: Dict[str, int] = row.get("stage3_selected_phrase_ranks", {}) or {}
206
+ structural = set(row.get("structural", []))
207
+
208
+ # Remove low-scoring Stage 3 selections.
209
+ filtered_stage3 = set()
210
+ for t in stage3_selected:
211
+ if mode == "rank":
212
+ rank = stage3_ranks.get(t)
213
+ if rank is None:
214
+ filtered_stage3.add(t)
215
+ elif rank <= int(thr):
216
+ filtered_stage3.add(t)
217
+ elif mode == "phrase_rank":
218
+ rank = stage3_phrase_ranks.get(t)
219
+ if rank is None:
220
+ filtered_stage3.add(t)
221
+ elif rank <= int(thr):
222
+ filtered_stage3.add(t)
223
+ else:
224
+ score = stage3_scores.get(t)
225
+ if score is None:
226
+ filtered_stage3.add(t)
227
+ elif score > thr:
228
+ filtered_stage3.add(t)
229
+
230
+ available = filtered_stage3 | structural
231
+
232
+ if expand and available:
233
+ available, _ = expand_tags_via_implications(available)
234
+
235
+ selected = available
236
+
237
+ selected -= _EVAL_EXCLUDED_TAGS
238
+
239
+ p, r, f1 = _compute_metrics(selected, gt)
240
+ total_p += p
241
+ total_r += r
242
+ total_f1 += f1
243
+
244
+ leaf_sel = get_leaf_tags(selected)
245
+ leaf_gt = get_leaf_tags(gt)
246
+ lp, lr, lf1 = _compute_metrics(leaf_sel, leaf_gt)
247
+ total_lp += lp
248
+ total_lr += lr
249
+ total_lf1 += lf1
250
+
251
+ # Oracle max: perfect selection from available tags.
252
+ if gt:
253
+ oracle_r = len(gt & available) / len(gt)
254
+ oracle_f1 = (2 * oracle_r / (1 + oracle_r)) if oracle_r > 0 else 0.0
255
+ else:
256
+ oracle_r = 1.0
257
+ oracle_f1 = 1.0
258
+ total_oracle_r += oracle_r
259
+ total_oracle_f1 += oracle_f1
260
+
261
+ if category_curves:
262
+ cat_gt: Dict[str, Set[str]] = defaultdict(set)
263
+ cat_sel: Dict[str, Set[str]] = defaultdict(set)
264
+ for t in gt:
265
+ cat_gt[_categorize(t, tag_type)].add(t)
266
+ for t in selected:
267
+ cat_sel[_categorize(t, tag_type)].add(t)
268
+ for cat in set(cat_gt.keys()) | set(cat_sel.keys()):
269
+ cp, cr, cf1 = _compute_metrics(cat_sel.get(cat, set()), cat_gt.get(cat, set()))
270
+ cat_totals[cat]["p"] += cp
271
+ cat_totals[cat]["r"] += cr
272
+ cat_totals[cat]["f1"] += cf1
273
+ cat_totals[cat]["n"] += 1
274
+
275
+ total_sel += len(selected)
276
+ total_gt += len(gt)
277
+ n += 1
278
+
279
+ if n == 0:
280
+ continue
281
+
282
+ results.append({
283
+ "threshold": thr,
284
+ "P": total_p / n,
285
+ "R": total_r / n,
286
+ "F1": total_f1 / n,
287
+ "leaf_P": total_lp / n,
288
+ "leaf_R": total_lr / n,
289
+ "leaf_F1": total_lf1 / n,
290
+ "avg_selected": total_sel / n,
291
+ "avg_gt": total_gt / n,
292
+ "oracle_R": total_oracle_r / n,
293
+ "oracle_F1": total_oracle_f1 / n,
294
+ })
295
+
296
+ if category_curves:
297
+ for cat, stats in sorted(cat_totals.items()):
298
+ if stats["n"] == 0:
299
+ continue
300
+ category_rows.append({
301
+ "threshold": thr,
302
+ "category": cat,
303
+ "P": stats["p"] / stats["n"],
304
+ "R": stats["r"] / stats["n"],
305
+ "F1": stats["f1"] / stats["n"],
306
+ })
307
+
308
+ return results, category_rows
309
+
310
+
311
+ def main() -> int:
312
+ ap = argparse.ArgumentParser(description="Analyze post-hoc Stage3 score thresholds.")
313
+ ap.add_argument("path", nargs="?", type=str, default=None,
314
+ help="Path to compact eval JSONL (default: latest in data/eval_results)")
315
+ ap.add_argument("--min", dest="min_v", type=float, default=0.0, help="Min threshold")
316
+ ap.add_argument("--max", dest="max_v", type=float, default=1.0, help="Max threshold")
317
+ ap.add_argument("--step", type=float, default=0.05, help="Threshold step size")
318
+ ap.add_argument("--values", type=str, default="",
319
+ help="Comma-separated explicit thresholds (overrides min/max/step)")
320
+ ap.add_argument("--mode", choices=["score", "rank", "phrase_rank"], default="score",
321
+ help="Threshold mode: score (default), rank (global), or phrase_rank (per-phrase)")
322
+ ap.add_argument("--rank-min", type=int, default=1, help="Min rank threshold (rank mode)")
323
+ ap.add_argument("--rank-max", type=int, default=300, help="Max rank threshold (rank mode)")
324
+ ap.add_argument("--rank-step", type=int, default=10, help="Rank threshold step (rank mode)")
325
+ ap.add_argument("--no-expand-implications", action="store_true",
326
+ help="Do not re-expand tags via implications")
327
+ ap.add_argument("--category-curves", action="store_true",
328
+ help="Emit category-level precision/recall/F1 curves")
329
+ args = ap.parse_args()
330
+
331
+ if args.path:
332
+ path = Path(args.path)
333
+ else:
334
+ path = sorted((_REPO_ROOT / "data" / "eval_results").glob("eval_*.jsonl"))[-1]
335
+
336
+ values = []
337
+ if args.values.strip():
338
+ values = [float(v.strip()) for v in args.values.split(",") if v.strip()]
339
+
340
+ if args.mode in ("rank", "phrase_rank"):
341
+ if values:
342
+ thresholds = sorted(set(int(v) for v in values))
343
+ else:
344
+ thresholds = list(range(args.rank_min, args.rank_max + 1, args.rank_step))
345
+ else:
346
+ thresholds = _iter_thresholds(values, args.min_v, args.max_v, args.step)
347
+
348
+ results, category_rows = analyze(
349
+ path,
350
+ thresholds,
351
+ expand_implications=not args.no_expand_implications,
352
+ category_curves=args.category_curves,
353
+ mode=args.mode,
354
+ )
355
+
356
+ # Write CSV to stdout
357
+ if args.mode in ("rank", "phrase_rank"):
358
+ print("rank_max,P,R,F1,leaf_P,leaf_R,leaf_F1,avg_selected,avg_gt,oracle_R,oracle_F1")
359
+ else:
360
+ print("threshold,P,R,F1,leaf_P,leaf_R,leaf_F1,avg_selected,avg_gt,oracle_R,oracle_F1")
361
+ for row in results:
362
+ if args.mode in ("rank", "phrase_rank"):
363
+ print(
364
+ f"{int(row['threshold'])},{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f},"
365
+ f"{row['leaf_P']:.4f},{row['leaf_R']:.4f},{row['leaf_F1']:.4f},"
366
+ f"{row['avg_selected']:.2f},{row['avg_gt']:.2f},"
367
+ f"{row['oracle_R']:.4f},{row['oracle_F1']:.4f}"
368
+ )
369
+ else:
370
+ print(
371
+ f"{row['threshold']:.4f},{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f},"
372
+ f"{row['leaf_P']:.4f},{row['leaf_R']:.4f},{row['leaf_F1']:.4f},"
373
+ f"{row['avg_selected']:.2f},{row['avg_gt']:.2f},"
374
+ f"{row['oracle_R']:.4f},{row['oracle_F1']:.4f}"
375
+ )
376
+
377
+ # ASCII sparkline graph for core metrics
378
+ p_vals = [r["P"] for r in results]
379
+ r_vals = [r["R"] for r in results]
380
+ f1_vals = [r["F1"] for r in results]
381
+ print("\nP " + _sparkline(p_vals))
382
+ print("R " + _sparkline(r_vals))
383
+ print("F1 " + _sparkline(f1_vals))
384
+
385
+ if args.category_curves and category_rows:
386
+ print("\nCATEGORY_CURVES")
387
+ if args.mode in ("rank", "phrase_rank"):
388
+ print("rank_max,category,P,R,F1")
389
+ else:
390
+ print("threshold,category,P,R,F1")
391
+ for row in category_rows:
392
+ if args.mode in ("rank", "phrase_rank"):
393
+ print(
394
+ f"{int(row['threshold'])},{row['category']},"
395
+ f"{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f}"
396
+ )
397
+ else:
398
+ print(
399
+ f"{row['threshold']:.4f},{row['category']},"
400
+ f"{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f}"
401
+ )
402
+
403
+ return 0
404
+
405
+
406
+ if __name__ == "__main__":
407
+ raise SystemExit(main())
scripts/eval_pipeline.py CHANGED
@@ -39,23 +39,33 @@ Requires:
39
 
40
  from __future__ import annotations
41
 
42
- import argparse
43
- import json
44
- import os
45
- import random
46
- import sys
47
- import threading
48
- import time
49
  from concurrent.futures import ThreadPoolExecutor, as_completed
50
  from dataclasses import dataclass, field
51
  from datetime import datetime
52
  from pathlib import Path
53
  from typing import Any, Dict, List, Optional, Set, Tuple
54
 
55
- _REPO_ROOT = Path(__file__).resolve().parents[1]
56
- if str(_REPO_ROOT) not in sys.path:
57
- sys.path.insert(0, str(_REPO_ROOT))
58
- os.chdir(_REPO_ROOT)
 
 
 
 
 
 
 
 
 
 
59
 
60
  EVAL_DATA_PATH = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl"
61
  EVAL_DATA_PATH_RAW = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000.jsonl"
@@ -124,11 +134,15 @@ class SampleResult:
124
  # Stage 2
125
  retrieved_tags: Set[str] = field(default_factory=set)
126
  retrieval_recall: float = 0.0
127
- # Stage 3 — overall
128
- selected_tags: Set[str] = field(default_factory=set)
129
- selection_precision: float = 0.0
130
- selection_recall: float = 0.0
131
- selection_f1: float = 0.0
 
 
 
 
132
  # Stage 3 — character tags only
133
  gt_character_tags: Set[str] = field(default_factory=set)
134
  selected_character_tags: Set[str] = field(default_factory=set)
@@ -190,16 +204,17 @@ def _compute_metrics(predicted: Set[str], ground_truth: Set[str]) -> Tuple[float
190
  return precision, recall, f1
191
 
192
 
193
- def _process_one_sample(
194
  sample: Dict[str, Any],
195
  index: int,
196
  total: int,
197
  skip_rewrite: bool,
198
  allow_nsfw: bool,
199
- mode: str,
200
- chunk_size: int,
201
- per_phrase_k: int,
202
- temperature: float,
 
203
  max_tokens: int,
204
  verbose: bool,
205
  print_lock: threading.Lock,
@@ -258,18 +273,24 @@ def _process_one_sample(
258
 
259
  # --- Stage 2: Retrieval ---
260
  t0 = time.time()
261
- retrieval_result = psq_candidates_from_rewrite_phrases(
262
- rewrite_phrases=result.rewrite_phrases,
263
- allow_nsfw_tags=allow_nsfw,
264
- global_k=300,
265
- verbose=False,
266
- )
 
 
267
  result.stage2_time = time.time() - t0
268
 
269
- if isinstance(retrieval_result, tuple):
270
- candidates, _ = retrieval_result
271
- else:
272
- candidates = retrieval_result
 
 
 
 
273
 
274
  result.retrieved_tags = {c.tag for c in candidates}
275
  if gt_tags:
@@ -294,16 +315,22 @@ def _process_one_sample(
294
  )
295
  result.stage3_time = time.time() - t0
296
 
297
- result.selected_tags = {candidates[idx].tag for idx in picked_indices} if picked_indices else set()
298
-
299
- # Build per-tag evidence from Stage 3 selection
300
- for idx in picked_indices:
301
- tag = candidates[idx].tag
302
- result.tag_evidence[tag] = {
303
- "source": "stage3",
304
- "why": tag_why.get(tag, "unknown"),
305
- "retrieval_score": round(candidates[idx].score_combined, 4),
306
- }
 
 
 
 
 
 
307
 
308
  # Why distribution
309
  why_counts: Dict[str, int] = {}
@@ -457,15 +484,16 @@ def _prewarm_retrieval_assets() -> None:
457
  print(f" Assets loaded in {time.time() - t0:.1f}s")
458
 
459
 
460
- def run_eval(
461
  n_samples: int = 20,
462
  caption_field: str = "caption_cogvlm",
463
  skip_rewrite: bool = False,
464
  allow_nsfw: bool = False,
465
  mode: str = "chunked_map_union",
466
- chunk_size: int = 60,
467
- per_phrase_k: int = 2,
468
- temperature: float = 0.0,
 
469
  max_tokens: int = 512,
470
  verbose: bool = False,
471
  shuffle: bool = True,
@@ -473,11 +501,14 @@ def run_eval(
473
  workers: int = 1,
474
  min_why: Optional[str] = "strong_implied",
475
  expand_implications: bool = False,
476
- infer_structural: bool = False,
477
- ) -> List[SampleResult]:
478
-
479
- # Load eval samples — prefer expanded file, fall back to raw
480
- eval_path = EVAL_DATA_PATH
 
 
 
481
  if not eval_path.is_file():
482
  eval_path = EVAL_DATA_PATH_RAW
483
  if not eval_path.is_file():
@@ -500,14 +531,17 @@ def run_eval(
500
  using_expanded = True
501
  else:
502
  gt_tags = _flatten_ground_truth_tags(row.get("tags_ground_truth_categorized", ""))
503
- if not gt_tags:
504
- continue
505
- # Remove eval-excluded tags from GT
506
- gt_tags -= _EVAL_EXCLUDED_TAGS
507
- all_samples.append({
508
- "id": row.get("id", row.get("row_id", len(all_samples))),
509
- "caption": caption.strip(),
510
- "gt_tags": gt_tags,
 
 
 
511
  })
512
  if using_expanded:
513
  print("Using implication-expanded ground truth")
@@ -534,13 +568,13 @@ def run_eval(
534
  # Sequential mode (original behavior)
535
  results: List[SampleResult] = []
536
  for i, sample in enumerate(samples):
537
- result = _process_one_sample(
538
- sample, i, total,
539
- skip_rewrite, allow_nsfw, mode, chunk_size,
540
- per_phrase_k, temperature, max_tokens, verbose,
541
- print_lock, min_why, expand_implications,
542
- infer_structural,
543
- )
544
  results.append(result)
545
  else:
546
  # Parallel mode
@@ -551,13 +585,13 @@ def run_eval(
551
  with ThreadPoolExecutor(max_workers=workers) as executor:
552
  futures = {
553
  executor.submit(
554
- _process_one_sample,
555
- sample, i, total,
556
- skip_rewrite, allow_nsfw, mode, chunk_size,
557
- per_phrase_k, temperature, max_tokens, verbose,
558
- print_lock, min_why, expand_implications,
559
- infer_structural,
560
- ): i
561
  for i, sample in enumerate(samples)
562
  }
563
  for future in as_completed(futures):
@@ -784,8 +818,9 @@ def print_summary(results: List[SampleResult]) -> None:
784
  print("=" * 70)
785
 
786
 
787
- def main(argv=None) -> int:
788
- ap = argparse.ArgumentParser(description="End-to-end pipeline evaluation")
 
789
  ap.add_argument("--n", type=int, default=20, help="Number of samples to evaluate")
790
  ap.add_argument("--caption-field", default="caption_cogvlm",
791
  choices=["caption_cogvlm", "caption_llm_0", "caption_llm_1",
@@ -797,8 +832,10 @@ def main(argv=None) -> int:
797
  ap.add_argument("--allow-nsfw", action="store_true", help="Allow NSFW tags")
798
  ap.add_argument("--mode", default="chunked_map_union",
799
  choices=["single_shot", "chunked_map_union"])
800
- ap.add_argument("--chunk-size", type=int, default=60)
801
- ap.add_argument("--per-phrase-k", type=int, default=2)
 
 
802
  ap.add_argument("--temperature", type=float, default=0.0)
803
  ap.add_argument("--max-tokens", type=int, default=512)
804
  ap.add_argument("--verbose", "-v", action="store_true", help="Show per-call Stage 3 logs")
@@ -830,10 +867,11 @@ def main(argv=None) -> int:
830
  caption_field=args.caption_field,
831
  skip_rewrite=args.skip_rewrite,
832
  allow_nsfw=args.allow_nsfw,
833
- mode=args.mode,
834
- chunk_size=args.chunk_size,
835
- per_phrase_k=args.per_phrase_k,
836
- temperature=args.temperature,
 
837
  max_tokens=args.max_tokens,
838
  verbose=args.verbose,
839
  shuffle=args.shuffle,
@@ -870,10 +908,11 @@ def main(argv=None) -> int:
870
  "caption_field": args.caption_field,
871
  "skip_rewrite": args.skip_rewrite,
872
  "allow_nsfw": args.allow_nsfw,
873
- "mode": args.mode,
874
- "chunk_size": args.chunk_size,
875
- "per_phrase_k": args.per_phrase_k,
876
- "temperature": args.temperature,
 
877
  "shuffle": args.shuffle,
878
  "seed": args.seed,
879
  "workers": args.workers,
@@ -926,13 +965,17 @@ def main(argv=None) -> int:
926
  # Diff sets (small — only the errors, not the full lists)
927
  "missed": missed_tags,
928
  "extra": extra_tags,
929
- # Full tag lists (needed for categorized evaluation)
930
- "ground_truth_tags": sorted(r.ground_truth_tags),
931
- "selected_tags": sorted(r.selected_tags),
932
- # Evidence for extra tags (why did these false positives get through?)
933
- "extra_evidence": {t: r.tag_evidence.get(t, {}) for t in extra_tags},
934
- # Structural tags inferred
935
- "structural": r.structural_tags,
 
 
 
 
936
  # Timing
937
  "t1": round(r.stage1_time, 2),
938
  "t2": round(r.stage2_time, 2),
@@ -953,9 +996,13 @@ def main(argv=None) -> int:
953
  "caption": r.caption,
954
  "ground_truth_tags": sorted(r.ground_truth_tags),
955
  "rewrite_phrases": r.rewrite_phrases,
956
- "retrieved_tags": sorted(r.retrieved_tags),
957
- "selected_tags": sorted(r.selected_tags),
958
- "implied_tags": sorted(r.implied_tags),
 
 
 
 
959
  "structural_tags": r.structural_tags,
960
  "categorized_suggestions": r.categorized_suggestions,
961
  "why_counts": r.why_counts,
 
39
 
40
  from __future__ import annotations
41
 
42
+ import argparse
43
+ import json
44
+ import os
45
+ import random
46
+ import sys
47
+ import threading
48
+ import time
49
  from concurrent.futures import ThreadPoolExecutor, as_completed
50
  from dataclasses import dataclass, field
51
  from datetime import datetime
52
  from pathlib import Path
53
  from typing import Any, Dict, List, Optional, Set, Tuple
54
 
55
+ _REPO_ROOT = Path(__file__).resolve().parents[1]
56
+ if str(_REPO_ROOT) not in sys.path:
57
+ sys.path.insert(0, str(_REPO_ROOT))
58
+ os.chdir(_REPO_ROOT)
59
+
60
+
61
+ def _ensure_utf8_stdio() -> None:
62
+ try:
63
+ if hasattr(sys.stdout, "reconfigure"):
64
+ sys.stdout.reconfigure(encoding="utf-8", errors="replace")
65
+ if hasattr(sys.stderr, "reconfigure"):
66
+ sys.stderr.reconfigure(encoding="utf-8", errors="replace")
67
+ except Exception:
68
+ pass
69
 
70
  EVAL_DATA_PATH = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl"
71
  EVAL_DATA_PATH_RAW = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000.jsonl"
 
134
  # Stage 2
135
  retrieved_tags: Set[str] = field(default_factory=set)
136
  retrieval_recall: float = 0.0
137
+ # Stage 3 — overall
138
+ selected_tags: Set[str] = field(default_factory=set)
139
+ stage3_selected_tags: Set[str] = field(default_factory=set)
140
+ stage3_selected_scores: Dict[str, float] = field(default_factory=dict)
141
+ stage3_selected_ranks: Dict[str, int] = field(default_factory=dict)
142
+ stage3_selected_phrase_ranks: Dict[str, int] = field(default_factory=dict)
143
+ selection_precision: float = 0.0
144
+ selection_recall: float = 0.0
145
+ selection_f1: float = 0.0
146
  # Stage 3 — character tags only
147
  gt_character_tags: Set[str] = field(default_factory=set)
148
  selected_character_tags: Set[str] = field(default_factory=set)
 
204
  return precision, recall, f1
205
 
206
 
207
+ def _process_one_sample(
208
  sample: Dict[str, Any],
209
  index: int,
210
  total: int,
211
  skip_rewrite: bool,
212
  allow_nsfw: bool,
213
+ mode: str,
214
+ chunk_size: int,
215
+ per_phrase_k: int,
216
+ per_phrase_final_k: int,
217
+ temperature: float,
218
  max_tokens: int,
219
  verbose: bool,
220
  print_lock: threading.Lock,
 
273
 
274
  # --- Stage 2: Retrieval ---
275
  t0 = time.time()
276
+ retrieval_result = psq_candidates_from_rewrite_phrases(
277
+ rewrite_phrases=result.rewrite_phrases,
278
+ allow_nsfw_tags=allow_nsfw,
279
+ per_phrase_final_k=per_phrase_final_k,
280
+ global_k=300,
281
+ return_phrase_ranks=True,
282
+ verbose=False,
283
+ )
284
  result.stage2_time = time.time() - t0
285
 
286
+ phrase_rank_by_tag = {}
287
+ if isinstance(retrieval_result, tuple):
288
+ if len(retrieval_result) == 2:
289
+ candidates, phrase_rank_by_tag = retrieval_result
290
+ else:
291
+ candidates = retrieval_result[0]
292
+ else:
293
+ candidates = retrieval_result
294
 
295
  result.retrieved_tags = {c.tag for c in candidates}
296
  if gt_tags:
 
315
  )
316
  result.stage3_time = time.time() - t0
317
 
318
+ result.selected_tags = {candidates[idx].tag for idx in picked_indices} if picked_indices else set()
319
+ result.stage3_selected_tags = set(result.selected_tags)
320
+
321
+ # Build per-tag evidence from Stage 3 selection
322
+ rank_by_tag = {c.tag: i + 1 for i, c in enumerate(candidates)}
323
+ for idx in picked_indices:
324
+ tag = candidates[idx].tag
325
+ result.stage3_selected_scores[tag] = round(candidates[idx].score_combined, 4)
326
+ result.stage3_selected_ranks[tag] = rank_by_tag.get(tag, len(candidates) + 1)
327
+ if phrase_rank_by_tag:
328
+ result.stage3_selected_phrase_ranks[tag] = phrase_rank_by_tag.get(tag, len(candidates) + 1)
329
+ result.tag_evidence[tag] = {
330
+ "source": "stage3",
331
+ "why": tag_why.get(tag, "unknown"),
332
+ "retrieval_score": round(candidates[idx].score_combined, 4),
333
+ }
334
 
335
  # Why distribution
336
  why_counts: Dict[str, int] = {}
 
484
  print(f" Assets loaded in {time.time() - t0:.1f}s")
485
 
486
 
487
+ def run_eval(
488
  n_samples: int = 20,
489
  caption_field: str = "caption_cogvlm",
490
  skip_rewrite: bool = False,
491
  allow_nsfw: bool = False,
492
  mode: str = "chunked_map_union",
493
+ chunk_size: int = 60,
494
+ per_phrase_k: int = 2,
495
+ per_phrase_final_k: int = 10,
496
+ temperature: float = 0.0,
497
  max_tokens: int = 512,
498
  verbose: bool = False,
499
  shuffle: bool = True,
 
501
  workers: int = 1,
502
  min_why: Optional[str] = "strong_implied",
503
  expand_implications: bool = False,
504
+ infer_structural: bool = False,
505
+ ) -> List[SampleResult]:
506
+ expand_gt = expand_implications
507
+ if expand_gt:
508
+ from psq_rag.retrieval.state import expand_tags_via_implications as _expand_gt_tags
509
+
510
+ # Load eval samples — prefer expanded file, fall back to raw
511
+ eval_path = EVAL_DATA_PATH
512
  if not eval_path.is_file():
513
  eval_path = EVAL_DATA_PATH_RAW
514
  if not eval_path.is_file():
 
531
  using_expanded = True
532
  else:
533
  gt_tags = _flatten_ground_truth_tags(row.get("tags_ground_truth_categorized", ""))
534
+ if not gt_tags:
535
+ continue
536
+ # Remove eval-excluded tags from GT
537
+ gt_tags -= _EVAL_EXCLUDED_TAGS
538
+ if expand_gt:
539
+ gt_tags, _ = _expand_gt_tags(gt_tags)
540
+ gt_tags -= _EVAL_EXCLUDED_TAGS
541
+ all_samples.append({
542
+ "id": row.get("id", row.get("row_id", len(all_samples))),
543
+ "caption": caption.strip(),
544
+ "gt_tags": gt_tags,
545
  })
546
  if using_expanded:
547
  print("Using implication-expanded ground truth")
 
568
  # Sequential mode (original behavior)
569
  results: List[SampleResult] = []
570
  for i, sample in enumerate(samples):
571
+ result = _process_one_sample(
572
+ sample, i, total,
573
+ skip_rewrite, allow_nsfw, mode, chunk_size,
574
+ per_phrase_k, per_phrase_final_k, temperature, max_tokens, verbose,
575
+ print_lock, min_why, expand_implications,
576
+ infer_structural,
577
+ )
578
  results.append(result)
579
  else:
580
  # Parallel mode
 
585
  with ThreadPoolExecutor(max_workers=workers) as executor:
586
  futures = {
587
  executor.submit(
588
+ _process_one_sample,
589
+ sample, i, total,
590
+ skip_rewrite, allow_nsfw, mode, chunk_size,
591
+ per_phrase_k, per_phrase_final_k, temperature, max_tokens, verbose,
592
+ print_lock, min_why, expand_implications,
593
+ infer_structural,
594
+ ): i
595
  for i, sample in enumerate(samples)
596
  }
597
  for future in as_completed(futures):
 
818
  print("=" * 70)
819
 
820
 
821
+ def main(argv=None) -> int:
822
+ _ensure_utf8_stdio()
823
+ ap = argparse.ArgumentParser(description="End-to-end pipeline evaluation")
824
  ap.add_argument("--n", type=int, default=20, help="Number of samples to evaluate")
825
  ap.add_argument("--caption-field", default="caption_cogvlm",
826
  choices=["caption_cogvlm", "caption_llm_0", "caption_llm_1",
 
832
  ap.add_argument("--allow-nsfw", action="store_true", help="Allow NSFW tags")
833
  ap.add_argument("--mode", default="chunked_map_union",
834
  choices=["single_shot", "chunked_map_union"])
835
+ ap.add_argument("--chunk-size", type=int, default=60)
836
+ ap.add_argument("--per-phrase-k", type=int, default=2)
837
+ ap.add_argument("--per-phrase-final-k", type=int, default=10,
838
+ help="Top-K candidates per phrase after scoring (retrieval cap)")
839
  ap.add_argument("--temperature", type=float, default=0.0)
840
  ap.add_argument("--max-tokens", type=int, default=512)
841
  ap.add_argument("--verbose", "-v", action="store_true", help="Show per-call Stage 3 logs")
 
867
  caption_field=args.caption_field,
868
  skip_rewrite=args.skip_rewrite,
869
  allow_nsfw=args.allow_nsfw,
870
+ mode=args.mode,
871
+ chunk_size=args.chunk_size,
872
+ per_phrase_k=args.per_phrase_k,
873
+ per_phrase_final_k=args.per_phrase_final_k,
874
+ temperature=args.temperature,
875
  max_tokens=args.max_tokens,
876
  verbose=args.verbose,
877
  shuffle=args.shuffle,
 
908
  "caption_field": args.caption_field,
909
  "skip_rewrite": args.skip_rewrite,
910
  "allow_nsfw": args.allow_nsfw,
911
+ "mode": args.mode,
912
+ "chunk_size": args.chunk_size,
913
+ "per_phrase_k": args.per_phrase_k,
914
+ "per_phrase_final_k": args.per_phrase_final_k,
915
+ "temperature": args.temperature,
916
  "shuffle": args.shuffle,
917
  "seed": args.seed,
918
  "workers": args.workers,
 
965
  # Diff sets (small — only the errors, not the full lists)
966
  "missed": missed_tags,
967
  "extra": extra_tags,
968
+ # Full tag lists (needed for categorized evaluation)
969
+ "ground_truth_tags": sorted(r.ground_truth_tags),
970
+ "selected_tags": sorted(r.selected_tags),
971
+ "stage3_selected": sorted(r.stage3_selected_tags),
972
+ "stage3_selected_scores": r.stage3_selected_scores,
973
+ "stage3_selected_ranks": r.stage3_selected_ranks,
974
+ "stage3_selected_phrase_ranks": r.stage3_selected_phrase_ranks,
975
+ # Evidence for extra tags (why did these false positives get through?)
976
+ "extra_evidence": {t: r.tag_evidence.get(t, {}) for t in extra_tags},
977
+ # Structural tags inferred
978
+ "structural": r.structural_tags,
979
  # Timing
980
  "t1": round(r.stage1_time, 2),
981
  "t2": round(r.stage2_time, 2),
 
996
  "caption": r.caption,
997
  "ground_truth_tags": sorted(r.ground_truth_tags),
998
  "rewrite_phrases": r.rewrite_phrases,
999
+ "retrieved_tags": sorted(r.retrieved_tags),
1000
+ "selected_tags": sorted(r.selected_tags),
1001
+ "stage3_selected": sorted(r.stage3_selected_tags),
1002
+ "stage3_selected_scores": r.stage3_selected_scores,
1003
+ "stage3_selected_ranks": r.stage3_selected_ranks,
1004
+ "stage3_selected_phrase_ranks": r.stage3_selected_phrase_ranks,
1005
+ "implied_tags": sorted(r.implied_tags),
1006
  "structural_tags": r.structural_tags,
1007
  "categorized_suggestions": r.categorized_suggestions,
1008
  "why_counts": r.why_counts,