Spaces:

FoodDesert
/

Prompt_Squirrel_RAG

Running

App Files Files Community

Food Desert commited on Feb 22

Commit

73f56cf

1 Parent(s): 41dd600

Add eval audit tools, caption-evident set, and logging

Browse files

Files changed (8) hide show

PROJECT_SUMMARY.md +34 -24
SESSION_QUICKSTART.md +31 -16
data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl +11 -0
psq_rag/retrieval/psq_retrieval.py +11 -0
psq_rag/tagging/categorized_suggestions.py +15 -9
scripts/analyze_caption_evident_audit.py +130 -0
scripts/analyze_threshold_grid.py +407 -0
scripts/eval_pipeline.py +141 -94

PROJECT_SUMMARY.md CHANGED Viewed

@@ -89,18 +89,23 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
 - Expanded ground truth annotations for evaluation
 - Leaf-only metrics to avoid penalizing implied tags
-### Evaluation Enhancements (Feb 10-14, 2026)
-- Added `--min-why` threshold filtering (explicit, strong_implied, weak_implied)
-- Per-tag evidence tracking
-- Compact eval output format
-- Retrieval gap analysis scripts
-- Multiple eval runs with different configurations
-- Stored eval results in `data/eval_results/`
-### Code Quality Improvements
-- Removed binary PNG files (migrated to Hugging Face XET storage)
-- Fixed eval_categorized.py compatibility with eval_pipeline.py output
-- Enhanced diagnostic and analysis scripts
 ---
@@ -122,10 +127,11 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
 - **SamplePrompts.csv**: Test prompts for development
 - **TagDocumentation.txt**: E621 tag documentation
-### Evaluation
-- **data/eval_samples/**: Test images with ground truth annotations
-- **data/eval_results/**: Stored evaluation results (JSONL format)
-- **eval_analysis.txt**: Latest per-category performance metrics
 ---
@@ -152,20 +158,24 @@ Implementation of a categorized tag suggestion system based on the e621 tagging
 ## Testing & Evaluation
-### Scripts
-- **scripts/eval_pipeline.py**: Main evaluation harness
-  - Parallel processing support
-  - Multiple min_why thresholds
-  - Ground truth comparison with implications expansion
 - **scripts/eval_categorized.py**: Per-category evaluation
   - Precision, recall, F1 per category
   - Constraint validation (exactly_one, multi, etc.)
   - Tier-based aggregation (CRITICAL, IMPORTANT, etc.)
-- **scripts/analyze_compact_eval.py**: Compact evaluation analysis
-- **scripts/analyze_retrieval_gaps.py**: Retrieval gap identification
-- **scripts/diagnose_structural_clothing.py**: Clothing inference diagnostics
 - **scripts/extract_wiki_data.py**: E621 wiki data extraction
 - **scripts/smoke_test.py**: Quick pipeline validation

 - Expanded ground truth annotations for evaluation
 - Leaf-only metrics to avoid penalizing implied tags
+### Evaluation Enhancements (Feb 10-14, 2026)
+- Added `--min-why` threshold filtering (explicit, strong_implied, weak_implied)
+- Per-tag evidence tracking
+- Compact eval output format
+- Retrieval gap analysis scripts
+- Multiple eval runs with different configurations
+- Stored eval results in `data/eval_results/`
+ - Added per-phrase retrieval cap flag: `--per-phrase-final-k`
+ - Added Stage 3 selection score/rank logging for post-hoc threshold analysis
+ - Added score/global-rank/phrase-rank grid analysis script
+### Code Quality Improvements
+- Removed binary PNG files (migrated to Hugging Face XET storage)
+- Fixed eval_categorized.py compatibility with eval_pipeline.py output
+- Enhanced diagnostic and analysis scripts
+ - Ensured tagging checklist loads from repo root if present
+ - Forced UTF-8 stdout/stderr in eval pipeline to avoid Windows encoding crashes
 ---
 - **SamplePrompts.csv**: Test prompts for development
 - **TagDocumentation.txt**: E621 tag documentation
+### Evaluation
+- **data/eval_samples/**: Test images with ground truth annotations
+- **data/eval_results/**: Stored evaluation results (JSONL format)
+- **eval_analysis.txt**: Latest per-category performance metrics
+- **data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl**: Caption-evident GT subset (10 samples) for retrieval-ceiling audits
 ---
 ## Testing & Evaluation
+### Scripts
+- **scripts/eval_pipeline.py**: Main evaluation harness
+  - Parallel processing support
+  - Multiple min_why thresholds
+  - Ground truth comparison with implications expansion
+  - `--per-phrase-final-k` retrieval cap control
+  - Logs `stage3_selected_scores`, `stage3_selected_ranks`, `stage3_selected_phrase_ranks`
 - **scripts/eval_categorized.py**: Per-category evaluation
   - Precision, recall, F1 per category
   - Constraint validation (exactly_one, multi, etc.)
   - Tier-based aggregation (CRITICAL, IMPORTANT, etc.)
+- **scripts/analyze_compact_eval.py**: Compact evaluation analysis
+- **scripts/analyze_retrieval_gaps.py**: Retrieval gap identification
+- **scripts/analyze_threshold_grid.py**: Post-hoc threshold grids (score/global rank/phrase rank)
+- **scripts/analyze_caption_evident_audit.py**: Caption-evident audit vs retrieval (optional implication expansion)
+- **scripts/diagnose_structural_clothing.py**: Clothing inference diagnostics
 - **scripts/extract_wiki_data.py**: E621 wiki data extraction
 - **scripts/smoke_test.py**: Quick pipeline validation

SESSION_QUICKSTART.md CHANGED Viewed

@@ -15,15 +15,17 @@ A RAG system that converts natural language prompts → e621-style tags for furr
 - **Evaluation Metrics**: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
 - **Multi-select Constraints**: Fixed body_type, species, gender to allow multiple tags
-## Key Files
-- `app.py` - Gradio web interface
-- `psq_rag/tagging/categorized_suggestions.py` - Category-based tag suggestions
-- `psq_rag/tagging/category_parser.py` - Parse e621 checklist
-- `scripts/eval_pipeline.py` - Main evaluation harness
-- `scripts/eval_categorized.py` - Per-category metrics
-- `docs/retrieval_contract.md` - Stage 2 spec
-- `docs/stage3_contract.md` - Stage 3 spec
-- `tagging_checklist.txt` - E621 tagging guidelines
 ## Running Code
 ```bash
@@ -65,17 +67,30 @@ ls -la psq_rag/
 ls -la data/eval_results/
 ```
-## Common Tasks
-- **Add category**: Edit `tagging_checklist.txt`, update parser
-- **Eval changes**: Run `scripts/eval_pipeline.py`, then `scripts/eval_categorized.py`
-- **Test retrieval**: Use `scripts/smoke_test.py`
-- **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
-## Data Artifacts (Lazy-loaded)
 - FastText embeddings (semantic similarity)
 - TF-IDF + SVD matrices (context similarity)
 - Alias → canonical tag mappings
-- Tag counts, implications, groups, wiki definitions
 ## NSFW Handling
 - Filtered via `word_rating_probabilities.csv` (threshold 0.95)

 - **Evaluation Metrics**: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG)
 - **Multi-select Constraints**: Fixed body_type, species, gender to allow multiple tags
+## Key Files
+- `app.py` - Gradio web interface
+- `psq_rag/tagging/categorized_suggestions.py` - Category-based tag suggestions
+- `psq_rag/tagging/category_parser.py` - Parse e621 checklist
+- `scripts/eval_pipeline.py` - Main evaluation harness
+- `scripts/eval_categorized.py` - Per-category metrics
+- `scripts/analyze_threshold_grid.py` - Threshold grid analysis (score/global rank/phrase rank)
+- `scripts/analyze_caption_evident_audit.py` - Caption-evident audit vs retrieval
+- `docs/retrieval_contract.md` - Stage 2 spec
+- `docs/stage3_contract.md` - Stage 3 spec
+- `tagging_checklist.txt` - E621 tagging guidelines
 ## Running Code
 ```bash
 ls -la data/eval_results/
 ```
+## Common Tasks
+- **Add category**: Edit `tagging_checklist.txt`, update parser
+- **Eval changes**: Run `scripts/eval_pipeline.py`, then `scripts/eval_categorized.py`
+- **Threshold sweeps**: Run `scripts/analyze_threshold_grid.py` (see `--mode score|rank|phrase_rank`)
+- **Caption-evident audit**: Run `scripts/analyze_caption_evident_audit.py`
+- **Test retrieval**: Use `scripts/smoke_test.py`
+- **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases)
+## Data Artifacts (Lazy-loaded)
 - FastText embeddings (semantic similarity)
 - TF-IDF + SVD matrices (context similarity)
 - Alias → canonical tag mappings
+- Tag counts, implications, groups, wiki definitions
+## Eval Datasets
+- `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl` - Base eval set (implication-expanded GT)
+- `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl` - Caption-evident GT subset (10 samples); used to estimate retrieval ceiling from text
+## New Eval Features (Feb 2026)
+- `eval_pipeline.py` now logs Stage 3 selection scores and ranks:
+  - `stage3_selected_scores` (retrieval score)
+  - `stage3_selected_ranks` (global rank)
+  - `stage3_selected_phrase_ranks` (per-phrase rank)
+- New CLI flag: `--per-phrase-final-k` to control per-phrase retrieval cap
 ## NSFW Handling
 - Filtered via `word_rating_probabilities.csv` (threshold 0.95)

data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl ADDED Viewed

	@@ -0,0 +1,11 @@

+{"_meta": true, "note": "Caption-evident audit subset (10 samples). tags_ground_truth_expanded contains only tags judged evident from caption_cogvlm. Use for estimating retrieval ceiling from text. Generated Feb 2026 from data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl.", "source_file": "data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl", "caption_field": "caption_cogvlm", "n_samples": 10}
+{"id": 3285630, "md5": "7e499711a05d48093608fb3a9b140fdc", "caption_cogvlm": "The image showcases an anthropomorphic feline character dressed in a formal attire. The character has a unique hairstyle with a large bun on top and black fur. The feline is wearing a teal shirt, a white tie, and a beige vest. He is holding a white mug in his right hand. The background is simple and transparent, allowing the character to be the main focus.", "caption_llm_0": "a solo male anthropomorphic feline character, clothed in fur and wearing eyewear. He has brown hair and black eyelashes, with 5 fingers on each hand. His body is brown with black fur, and he stands smiling while looking at the viewer. He holds a cup in one hand and a gun in the other. The text label is present but its content is unknown.", "caption_llm_1": "A solo male feline character, depicted in a simple background. the furry artwork features the character clothed in brown fur and black body, with brown hair and black eyelashes. the character is standing and looking at the viewer while holding a gun with one hand and a cup with the other. there's also text present in the image, but its content is not specified.", "caption_llm_2": "a solo male anthro felid, standing and looking at the viewer. He has brown fur and brown eyes, with black eyebrows and eyelashes. His hair is black, styled in a messy manner. He is holding a cup in his 5 fingers while smiling at the viewer. The background is simple and transparent, with no additional elements present.", "caption_llm_3": "A solo male anthropomorphic feline character, clothed in fur with brown and black body and fur colors. he has brown and black hair, as well as eyelashes. the character is standing while looking at the viewer, holding a gun with his 5 fingers. he also holds a cup containing a beverage. the text on the image reads \"text.\"", "caption_llm_4": "A solo male anthropomorphic feline with brown fur and black hair, set against a simple or transparent background.", "caption_llm_5": "A solo male feline, with brown fur and five fingers. the background is simple and transparent. the animal is clothed in fur clothing, has black hair, and possesses brown body coloration.", "caption_llm_6": "A solo male feline, with brown fur and a simple background. the animal is clothed in fur clothing, has black hair, and is depicted against a transparent background.", "caption_llm_7": "A solo male anthropomorphic feline, with brown fur and black hair. the feline has five fingers and is depicted in a solo pose.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"eyewear\", \"topwear\"], \"animals_and_anthropomorphic_features\": [\"anthro\"], \"characters_and_gender\": [\"male\"], \"hairstyle\": [\"hair\", \"brown_hair\", \"black_hair\", \"eyelashes\"], \"background_and_setting\": [\"simple_background\", \"transparent_background\", \"white_background\"], \"body_and_body_parts\": [\"fingers\", \"5_fingers\", \"breasts\", \"feet\", \"eyebrows\", \"toes\", \"teeth\", \"ear_piercing\"], \"furniture_and_objects\": [\"weapon\", \"ranged_weapon\", \"gun\", \"container\", \"beverage\", \"cup\", \"armor\"], \"colors\": [\"brown_body\", \"brown_fur\", \"black_body\", \"black_fur\"], \"emotions_and_expressions\": [\"smile\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\", \"holding_object\", \"holding_weapon\", \"holding_gun\", \"holding_cup\", \"holding_container\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"feline\", \"mammal\", \"felid\"]}", "tags_ground_truth_categorized": "{\"body_and_body_parts\": [\"5_fingers\", \"fingers\"], \"animals_and_anthropomorphic_features\": [\"anthro\"], \"hairstyle\": [\"black_hair\", \"hair\"], \"colors\": [\"brown_body\", \"brown_fur\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\", \"transparent_background\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"felid\", \"feline\", \"mammal\"]}", "tags_ground_truth_expanded": ["alpha_channel", "anthro", "clothed", "clothing", "felid", "feline", "fingers", "fur", "hair", "male", "mammal", "simple_background", "solo", "transparent_background"]}
+{"id": 260449, "md5": "5c21e7ccf1bdaa67e396df8a5bb90dc8", "caption_cogvlm": "The image showcases a group of animated characters. On the left, there's a large, jovial ape with a wide grin, raised arms, and a playful expression. In the center, a large, jovial bear is seen laughing and playfully interacting with a young boy, who is dancing with his arms raised. The boy has a cheerful expression and is wearing a loincloth. On the right, there's a smaller, mischievous-looking primate with a tuft of hair on its head, looking directly at the viewer with a cheeky grin. The background is simple, emphasizing the characters.", "caption_llm_0": "a solo male anthropomorphic bear, standing and looking at the viewer with long black hair. The background is simple and white. The bear has claws, a tongue, and is bipedal. It wears clothing that includes fur on top of its body while being topless from the waist up and wearing bottomwear. There's also a text label present in the image.", "caption_llm_1": "artwork of baloo and mowgli. a solo male character, depicted in furry artwork style. The background is simple and white. The character is clothed, with a top layer of fur and bottomwear. He has a smile on his face and his mouth open, as if gesturing or speaking to the viewer. His body parts include feet, fingers, teeth, toes, breasts (for an overweight character), tufts of hair on his head or body (furry), and young age. This human-like primate species includes elements from apes and bears in its appearance.", "caption_llm_2": "a solo male character, depicted as a human-like primate with long black hair. He is clothed in fur and wears bottomwear, while his upper body is topless. The background is simple and white. The character stands with a smile on his face, looking at the viewer while making a gesture with one hand. His feet and toes are visible, along with his teeth and tufts of hair on his head.", "caption_llm_3": "artwork of baloo and mowgli. a solo male anthropomorphic bear, standing and looking at the viewer with a smile. The bear has claws, a tongue, and is bipedal. It is wearing clothing that includes fur and bottomwear while being topless. The background is simple with a white backdrop. The bear has feet, fingers, teeth, toes, tufts of hair on its head and body parts such as breasts (for an overweight appearance). There are gestures present in the image as well as text labels included in the scene.", "caption_llm_4": "A group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "caption_llm_5": "Artwork of baloo and mowgli. a group of male, slightly chubby primates in various haplorhine species, including apes and bears. they are clothed in simple clothing adorned with fur. the primates are engaged in a lively dance while looking at the viewer. their hair is visible, and the background is kept simple to emphasize their actions and poses.", "caption_llm_6": "A group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "caption_llm_7": "Artwork of baloo and mowgli. a group of male characters, each clothed in clothing made of fur. they are depicted as various species including apes, bears, and primates. the characters are engaged in a lively dance while looking at the viewer with their claws visible. their hair is styled naturally, and the background is simple.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"male\"], \"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"size_difference\", \"claws\", \"tongue\", \"feral\", \"biped\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"topless\", \"bottomwear\", \"nude\"], \"hairstyle\": [\"hair\", \"long_hair\", \"black_hair\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"body_and_body_parts\": [\"feet\", \"fingers\", \"teeth\", \"toes\", \"breasts\", \"overweight\", \"tuft\", \"young\", \"5_fingers\", \"belly\", \"navel\", \"big_breasts\", \"muscular\", \"slightly_chubby\", \"markings\"], \"miscellaneous\": [\"text\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\", \"gesture\", \"transformation\", \"looking_at_another\", \"eyes_closed\", \"lying\", \"front_view\"], \"species_or_animal_type\": [\"mammal\", \"haplorhine\", \"canine\", \"bear\", \"canid\", \"monkey\", \"pokemon_(species)\", \"scalie\", \"human\", \"ape\", \"reptile\", \"primate\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"claws\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\", \"topless\"], \"actions_and_poses\": [\"dancing\", \"looking_at_viewer\"], \"number_of_characters\": [\"group\"], \"hairstyle\": [\"hair\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\"], \"body_and_body_parts\": [\"slightly_chubby\"], \"species_or_animal_type\": [\"ape\", \"bear\", \"haplorhine\", \"human\", \"mammal\", \"primate\"]}", "tags_ground_truth_expanded": ["ape", "bear", "clothed", "clothing", "dancing", "fur", "group", "hair", "haplorhine", "human", "looking_at_viewer", "male", "mammal", "primate", "simple_background"]}
+{"id": 1078019, "md5": "fc858593b7b9fbe82ce728778841e0cf", "caption_cogvlm": "The image showcases two anthropomorphic rabbits. The one on the left has a confident and slightly playful expression, with teal eyes and a blush on its cheeks. It's wearing a coat and holding a small plushie. The rabbit on the right appears to be more surprised or taken aback, with wide open blue eyes. Both rabbits seem to be in a close and intimate setting, suggesting a romantic or close relationship between them.", "caption_llm_0": "a male and female anthropomorphic rabbit, both clothed, standing close to each other in a simple white background. They are smiling and blushing while embracing with half-closed eyes. The male rabbit has buckteeth, and they are holding an object while looking at the viewer.", "caption_llm_1": "artwork of clancy (inkyfrog) and percy vison. a young, clothed male and female mustelid in a simple white background setting. They are embracing each other with half-closed eyes and open smiles, while looking at the viewer. The male weasel is holding an object, possibly a gift or toy for their partner. The background has dialogue text that adds to the scene's context.", "caption_llm_2": "a young, clothed male and female mustelid in a simple white background setting. They are embracing each other with half-closed eyes and open smiles, while looking at the viewer. The male weasel is holding an object, possibly a gift or toy for their partner. The background has dialogue text that adds to the scene's context.", "caption_llm_3": "artwork of clancy (inkyfrog) and percy vison. a male and female anthropomorphic rabbit, both clothed, standing close to each other in a simple white background. They are smiling and blushing while embracing with half-closed eyes. The male rabbit has buckteeth, and they are holding an object while looking at the viewer.", "caption_llm_4": "a romantic couple of alternate species rabbits, each with their own unique plushie clothing. They stand close to each other, blushing and open-mouthed in affectionate expressions. The simple white background allows the focus to be on the adorable rabbit duo.", "caption_llm_5": "Artwork of clancy (inkyfrog) and percy vison. a romantic couple of alternate species anthropomorphic rabbits, each with teal eyes and blushing. they are clothed in simple outfits against a white background, holding a plushie between them.", "caption_llm_6": "a romantic couple of alternate species rabbits, each with their own unique plushie clothing. They stand close to each other, blushing and open-mouthed in affectionate expressions. The simple white background allows the focus to be on the adorable rabbit duo.", "caption_llm_7": "artwork of clancy (inkyfrog) and percy vison. a romantic couple of rabbits, one with blue eyes and the other with teal eyes. They are both clothed in simple outfits, holding a plushie between them. The background is white and uncomplicated.", "tags_synthetic_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"buckteeth\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"dialogue\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\"], \"characters_and_gender\": [\"male\", \"female\"], \"number_of_characters\": [\"duo\", \"solo\", \"group\"], \"emotions_and_expressions\": [\"blush\", \"open_mouth\", \"smile\", \"open_smile\", \"half-closed_eyes\", \"embrace\", \"narrowed_eyes\"], \"body_and_body_parts\": [\"teeth\", \"bodily_fluids\", \"young\"], \"colors\": [\"blue_eyes\"], \"miscellaneous\": [\"text\"], \"actions_and_poses\": [\"holding_object\", \"looking_at_viewer\", \"hug\"], \"species_or_animal_type\": [\"weasel\", \"true_musteline\", \"mammal\", \"mustelid\", \"rabbit\", \"lagomorph\", \"musteline\", \"leporid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"alternate_species\", \"anthro\"], \"colors\": [\"blue_eyes\", \"teal_eyes\"], \"emotions_and_expressions\": [\"blush\", \"open_mouth\", \"romantic\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"plushie\"], \"number_of_characters\": [\"duo\"], \"characters_and_gender\": [\"male\", \"male/male\", \"romantic_couple\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"species_or_animal_type\": [\"lagomorph\", \"leporid\", \"mammal\", \"rabbit\"]}", "tags_ground_truth_expanded": ["anthro", "blue_eyes", "blush", "clothed", "clothing", "duo", "lagomorph", "leporid", "mammal", "plushie", "rabbit", "romantic", "romantic_couple", "teal_eyes"]}
+{"id": 1624724, "md5": "febfe277847481ae546525d1ccf4baff", "caption_cogvlm": "The image showcases a cartoonish, smiling creature with large, round eyes and a prominent red nose. It has a tan body with spots and possesses a unique, crosshaped mouth. The creature appears to be floating or hovering against a simple white background.", "caption_llm_0": "a solo character with ambiguous gender, standing in front view while looking at the viewer. The background is simple and white or transparent. The character has fur clothing and hair accessories, as well as anthropomorphic features such as scales and toony appearance. They have a brown body color with black eyes, or yellow body with green body color. This artwork may represent an alien experiment from Lilo & Stitch or a Generation 3 Pokémon species.", "caption_llm_1": "A solo, ambiguously gendered character with anthropomorphic features such as scales, toony appearance, and a long tongue. the character is wearing fur clothing and has spots on its body. it has teeth and displays an open-mouth smile while blushing. the species or animal type includes aliens, lilo & stitch experiments, generation 3 pokémon hybrids, and various pokémon species. the colors present in the image are brown body, yellow body, tan body, green body with black eyes.", "caption_llm_2": "a solo character with ambiguous gender, standing in front view while looking at the viewer. The background is simple and white or transparent. The character has fur clothing and hair accessories, as well as anthropomorphic features such as scales and toony appearance. They have a brown body color with black eyes, or yellow body with green body color. This artwork may represent an alien experiment from Lilo & Stitch or a Generation 3 Pokémon species.", "caption_llm_3": "A solo character with ambiguous gender, standing in front view while looking at the viewer. the background is simple and white or transparent. the character has fur clothing and hair accessories, along with a brown body color. it also features yellow or green body colors, black eyes, and may be an alien experiment from lilo & stitch or a generation 3 pokémon species.", "caption_llm_4": "A solo alien character with ambiguous gender, displaying a smile. the creature has brown eyes, a red nose, and a tan body. set against a simple white background, the alien is depicted as an experiment from lilo & stitch and is also part of generation 3 pokémon species.", "caption_llm_5": "A solo alien experiment from lilo and stitch, with an ambiguous gender. the background is simple and white. the character is smiling, and it's a generation 3 pokémon hybrid species.", "caption_llm_6": "A solo alien experiment from lilo and stitch, with an ambiguous gender. the background is simple and white. the character is smiling, and it's a generation 3 pokémon hybrid species.", "caption_llm_7": "A solo alien experiment, likely from the lilo & stitch universe, depicted as a hybrid pokémon from generation 3. the creature is shown with an ambiguous gender and has a cheerful smile on its face.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"transparent_background\", \"food\", \"dialogue\", \"countershading\"], \"characters_and_gender\": [\"ambiguous_gender\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"feral\", \"scales\", \"toony\", \"tongue\"], \"clothing_and_accessories\": [\"fur\", \"clothing\"], \"actions_and_poses\": [\"front_view\", \"looking_at_viewer\", \"standing\"], \"body_and_body_parts\": [\"spots\", \"teeth\", \"tuft\", \"muscular\", \"markings\", \"bodily_fluids\", \"feet\", \"toes\", \"breasts\", \"fingers\", \"glistening\", \"huge_deltoids\", \"big_deltoids\"], \"emotions_and_expressions\": [\"open_mouth\", \"smile\", \"blush\"], \"miscellaneous\": [\"text\"], \"hairstyle\": [\"hair\"], \"colors\": [\"brown_body\", \"yellow_body\", \"tan_body\", \"green_body\", \"black_eyes\"], \"species_or_animal_type\": [\"generation_3_pokemon\", \"mammal\", \"pokemon_(species)\", \"alien\", \"experiment_(lilo_and_stitch)\", \"hybrid\"]}", "tags_ground_truth_categorized": "{\"characters_and_gender\": [\"ambiguous_gender\"], \"colors\": [\"brown_eyes\", \"red_nose\", \"tan_body\"], \"miscellaneous\": [\"crossover\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"emotions_and_expressions\": [\"smile\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"alien\", \"experiment_(lilo_and_stitch)\", \"generation_3_pokemon\", \"hybrid\", \"pokemon_(species)\"]}", "tags_ground_truth_expanded": ["red_nose", "simple_background", "smile", "solo", "tan_body", "white_background"]}
+{"id": 1325009, "md5": "929bdced281bd135ed1ca76df24332d4", "caption_cogvlm": "The image showcases an anthropomorphic tiger with striking blue eyes. He is depicted in a muscular and confident pose, with one hand raised to his head in a thoughtful or playful gesture. The tiger has a white chest with a tuft of fur, and his fur is striped in the traditional tiger pattern. He is wearing dark blue shorts, and his muscular physique is accentuated by the lighting in the background, which creates a countershading effect. The overall mood of the image is one of confidence and playfulness.", "caption_llm_0": "a muscular anthropomorphic male tiger with stripes, humanoid hands, and claws. He has a chest tuft and a striped body. The background is simple with countershading, white and orange colors are present on his furry body. His eyes are green while his nose is black. He's standing in an outside setting near food, possibly hunting or observing it from afar.", "caption_llm_1": "artwork of tiger dancer (zootopia). a muscular anthropomorphic male tiger, with stripes and humanoid hands. He has claws, a chest tuft, and a striped body. The tiger is bipedal and has a tail. His fur is white with orange stripes, while his body is brown. He wears clothing that covers his top half but leaves his bottom half exposed. The tiger has abs, pecs, biceps, big muscles on his arms and chest area as well as overweight belly which makes him look more muscular than the average anthro tiger in this style of artwork. His fingers are visible due to the humanoid hands feature he possesses along with claws on each finger tip for better grip or attack purposes if needed in the scene depicted in this artwork piece .He also has navel showing through the clothing he's wearing which adds to its realism factor making it look like an actual person rather than just an animal character .The text might be present somewhere within or around this image possibly indicating some sort of context or storyline associated with it but not much can be said about that without further information about what exactly it says .", "caption_llm_2": "a solo male muscular character, depicted in fur clothing and bottomwear. The background is simple with countershading, featuring a white background and elements like food, sky, and clouds. The character has a white body with orange fur accents on its face and tail. Its nose is pink while its eyes are blue. It's shown smiling with one eye closed as it looks at the viewer from a front view while sitting or lying down on the ground.", "caption_llm_3": "artwork of tiger dancer (zootopia). a muscular male anthro tiger, with stripes and humanoid hands. He has claws, a chest tuft, and a striped body. The tiger is depicted in a solo scene wearing clothing that covers his top half while exposing his belly. His fur color is white with orange stripes on the body and brown fur on the face. He has blue eyes, black nose, pink nose markings on his cheeks, and green eyes as well. The tiger's pose includes him standing or sitting in various positions such as looking at the viewer or winking while showing off his muscular physique including pecs, biceps, abs, overweight belly area along with fingers and navel details visible through the clothing.", "caption_llm_4": "A solo, muscular male pantherine tiger with blue eyes. the tiger is depicted topless and clothed in fur shorts. he has a tuft on his head and is smiling while looking at the viewer with one hand on his head. the overall color scheme of the image is dominated by the blue eyes of the tiger, set against a background that may or may not be present in this description.", "caption_llm_5": "artwork of tiger dancer (zootopia). a solo, muscular male pantherine tiger with blue eyes and a tuft of chest hair. He is clothed in fur shorts and topless, with his hand on his head as he looks directly at the viewer.", "caption_llm_6": "a solo, muscular male tiger with blue eyes, wearing clothing and fur shorts. The tiger has chest tufts and stripes, displaying a pantherine appearance. It stands on its hind legs at the countershaded background, smiling confidently.", "caption_llm_7": "artwork of tiger dancer (zootopia). a solo, muscular male anthro tiger with a chest tuft and stripes, standing on its hind legs with one hand on its head and the other holding a blue-eyed pantherine felid. The background features countershading to create depth. The tiger has a smile on its face as it looks directly at the viewer.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"male\", \"muscular_male\", \"overweight_male\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"muscular_anthro\", \"stripes\", \"feral\", \"humanoid_hands\", \"claws\", \"chest_tuft\", \"striped_body\", \"biped\", \"striped_fur\", \"tail\", \"overweight_anthro\"], \"number_of_characters\": [\"solo\"], \"clothing_and_accessories\": [\"fur\", \"clothing\", \"topless\", \"clothed\", \"kemono\", \"bottomwear\"], \"body_and_body_parts\": [\"muscular\", \"pecs\", \"biceps\", \"tuft\", \"abs\", \"overweight\", \"belly\", \"fingers\", \"navel\", \"big_muscles\", \"breasts\", \"teeth\", \"5_fingers\", \"nipples\", \"markings\", \"toes\", \"feet\", \"moobs\", \"eyebrows\", \"young\"], \"background_and_setting\": [\"simple_background\", \"countershading\", \"white_background\", \"outside\", \"food\", \"sky\", \"cloud\"], \"colors\": [\"white_body\", \"orange_body\", \"white_fur\", \"orange_fur\", \"brown_fur\", \"brown_body\", \"black_body\", \"pink_nose\", \"blue_eyes\", \"black_nose\", \"green_eyes\", \"black_fur\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"blush\", \"grin\"], \"actions_and_poses\": [\"looking_at_viewer\", \"standing\", \"one_eye_closed\", \"lying\", \"wink\", \"sitting\", \"eyes_closed\", \"front_view\", \"pose\"], \"hairstyle\": [\"hair\", \"white_hair\", \"black_hair\", \"short_hair\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"tiger\", \"mammal\", \"canid\", \"pantherine\", \"felid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"chest_tuft\", \"muscular_anthro\", \"stripes\"], \"colors\": [\"blue_eyes\"], \"clothing_and_accessories\": [\"bottomwear\", \"clothed\", \"clothing\", \"fur\", \"shorts\", \"topless\"], \"background_and_setting\": [\"countershading\"], \"actions_and_poses\": [\"hand_on_head\", \"looking_at_viewer\"], \"characters_and_gender\": [\"male\", \"muscular_male\"], \"body_and_body_parts\": [\"muscular\", \"tuft\"], \"emotions_and_expressions\": [\"smile\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"felid\", \"mammal\", \"pantherine\", \"tiger\"]}", "tags_ground_truth_expanded": ["anthro", "blue_eyes", "bottomwear", "chest_tuft", "clothed", "clothing", "countershading", "felid", "fur", "hand_on_head", "male", "mammal", "muscular", "muscular_anthro", "muscular_male", "pantherine", "shorts", "solo", "stripes", "tiger", "topless", "tuft"]}
+{"id": 1023509, "md5": "04151411520ae750887cbd79cda9239d", "caption_cogvlm": "The image is a multi-panel comic strip. The first panel shows a character lying on the ground, surrounded by darkness, with a speech bubble saying 'I'm done for...'. The next panel depicts a hooded figure standing over the character, with a speech bubble saying 'You're not done for, you're just beginning.'. The following panels show a conversation between the hooded figure and another character, where the hooded figure mentions 'I'm the guardian of the realm of darkness'. The dialogue continues with the hooded figure expressing that the character has been chosen for a task. The final panels depict a group of characters, including a white-furred creature, a goat, a human, and a lizard, discussing a plan to 'defeat the darkness'. The comic ends with a textual note saying 'there is light'.", "caption_llm_0": "a duo of anthropomorphic goats, one male and one female, wearing clothing with fur. The male goat is clothed in a shirt and headwear, while the female goat is clothed in topwear and headgear. They are holding melee weapons and have closed eyes. Their bodies are white with red eyes, while their fur is also white.", "caption_llm_1": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a duo of anthropomorphic goats, one male and one female, standing in a simple background. The male goat has white fur and red eyes, while the female goat has brown hair. They are both clothed in shirts and headwear, with the male wearing a hat. The scene takes place outside during rainy weather.", "caption_llm_2": "a duo of caprine goats, one with white fur and red eyes, standing outside in the rain. They are holding melee weapons and have a simple background. The male goat has brown hair while the female has blonde hair. Both animals are young and have glowing eyes.", "caption_llm_3": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a duo of anthropomorphic goats, one male and one female, standing in a simple background. The male goat is wearing a shirt and hat while the female goat has brown hair. They are engaged in dialogue with each other, possibly discussing something humorous as they both have open mouths and are smiling. Their bodies are covered in white fur while their eyes have red irises.", "caption_llm_4": "a humanoid goat-like creature with red eyes and a white body, standing in front of a dialogue background. The creature is holding two dice in its armless body. A text label is present, but its content is unknown.", "caption_llm_5": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a caprine goat with long ears, human-like features, and red eyes. It is standing on a white body with white fur. The background shows a dialogue taking place. The animal is wearing fur clothing and holding dice in its armless body. A text label can be seen in the image, but its content is not specified.", "caption_llm_6": "A humanoid goat with long ears, wearing fur clothing. the scene is set against a background of dialogue. a lizard and a mammal are also present, along with a caprine creature resembling a boss monster. the characters engage in an interaction while holding dice and text labels are visible nearby.", "caption_llm_7": "artwork of asriel dreemurr, chara (undertale), frisk (undertale), mettaton, mettaton ex, and monster kid. a humanoid goat with long ears, red eyes, and white fur. It is set against a dialogue background. The character wears fur clothing and has no arms. A text label is present in the scene, but its content is not specified.", "tags_synthetic_categorized": "{\"miscellaneous\": [\"text\", \"speech_bubble\", \"profanity\"], \"clothing_and_accessories\": [\"clothing\", \"fur\", \"topwear\", \"clothed\", \"headgear\", \"shirt\", \"headwear\", \"hat\"], \"characters_and_gender\": [\"male\", \"female\", \"ambiguous_gender\"], \"background_and_setting\": [\"dialogue\", \"raining\", \"outside\", \"food\", \"snow\", \"simple_background\", \"inside\", \"tree\"], \"furniture_and_objects\": [\"weapon\", \"melee_weapon\", \"armor\", \"furniture\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"not_furry\", \"horn\"], \"hairstyle\": [\"hair\", \"brown_hair\", \"blonde_hair\", \"black_hair\", \"white_hair\"], \"emotions_and_expressions\": [\"open_mouth\", \"smile\", \"humor\", \"blush\", \"tears\", \"angry\", \"crying\"], \"colors\": [\"white_body\", \"white_fur\", \"red_eyes\"], \"number_of_characters\": [\"duo\", \"group\", \"solo\"], \"body_and_body_parts\": [\"teeth\", \"bodily_fluids\", \"young\", \"bone\", \"breasts\"], \"actions_and_poses\": [\"holding_weapon\", \"holding_object\", \"eyes_closed\", \"glowing\"], \"species_or_animal_type\": [\"human\", \"goat\", \"mammal\", \"caprine\", \"bovid\", \"lagomorph\", \"scalie\", \"boss_monster\", \"reptile\"]}", "tags_ground_truth_categorized": "{\"body_and_body_parts\": [\"armless\"], \"furniture_and_objects\": [\"d6\", \"dice\"], \"background_and_setting\": [\"dialogue\"], \"clothing_and_accessories\": [\"fur\"], \"animals_and_anthropomorphic_features\": [\"long_ears\"], \"colors\": [\"red_eyes\", \"white_body\", \"white_fur\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"boss_monster\", \"bovid\", \"caprine\", \"goat\", \"human\", \"lizard\", \"mammal\", \"reptile\", \"scalie\"]}", "tags_ground_truth_expanded": ["bovid", "caprine", "dialogue", "fur", "goat", "human", "lizard", "mammal", "reptile", "scalie", "text", "white_body", "white_fur"]}
+{"id": 335343, "md5": "780f7c3acd520cc2ce5bbb8a91e99937", "caption_cogvlm": "The image showcases two animated characters lying on a bed, seemingly in a resting state. The character on the left has blonde hair, green eyes, and is wearing makeup, with a slightly annoyed or disgruntled expression. The character on the right has purple hair, blue eyes, and a more relaxed or sleeping expression. Between them, there's a text that reads 'Look Before You Sleep', written in a playful font. The image also has a watermark at the bottom left corner that says 'SkyPony'. The overall color palette is dominated by shades of blue and purple, creating a serene and calming ambiance.", "caption_llm_0": "a solo female earth pony with rainbow hair, lying down on a bed with blue body and white fur. The background is simple, featuring snow and nighttime. The pony has blue eyes and is smiling while blushing. A plant can be seen in the detailed background.", "caption_llm_1": "artwork of applejack (mlp) and rarity (mlp). a solo female earth pony with a blue body, blue feathers, and blonde hair. She has long eyelashes and is smiling. The background features snowy surroundings with trees in the distance.", "caption_llm_2": "a solo female earth pony with purple hair, rainbow eyes, and a cutie mark on her flank. She is wearing clothing and has fur covering her body. The background is simple with snow falling outside during the nighttime. The pony is lying down, eyes closed, and smiling while blushing slightly.", "caption_llm_3": "artwork of applejack (mlp) and rarity (mlp). a solo female character with long, rainbow-colored hair and eyelashes. She is lying on a bed, wearing clothing and fur. The animal type is an earth pony or horse with blue body and white fur, along with blue feathers. The scene includes furniture such as a bed, pillow, bedding, and plant. The character has blush on her cheeks while smiling open-mouthed in her eyes-closed position.", "caption_llm_4": "A pair of earth ponies, one with blonde hair and the other with purple hair. they are lying down, eyes closed, and sleeping peacefully. the ponies have green eyes and white bodies or fur. one of them has freckles on their face. they are wearing makeup and eyeshadow, as well as a text label that is not specified in the description.", "caption_llm_5": "Artwork of applejack (mlp) and rarity (mlp). a duo of female earth ponies, both with white bodies and white fur. they have green eyes and freckles on their faces. one pony has a unicorn horn, while the other has feathers adorning its body. the ponies are lying down on a bed, resting with their eyes closed. a pillow is also present in the scene. the ponies wear makeup, including eyeshadow to enhance their appearance.", "caption_llm_6": "A duo of female earth ponies, one with blonde hair and the other with purple hair. they are both adorned with makeup, including eyeshadow. the ponies have white bodies and fur, as well as green eyes. one pony has freckles on its face. they are lying on a bed surrounded by furniture and pillows while displaying expressions of anger and fear. a text label is also present in the image.", "caption_llm_7": "Artwork of applejack (mlp) and rarity (mlp). a duo of female characters, one with blonde hair and the other with purple hair. they are lying on a bed, which is adorned with furniture and pillows. the characters have freckles on their faces and are wearing eyeshadow, fur clothing, and makeup. one character has green eyes while the other has white fur covering their body. \nthe scene also includes an earth pony, a unicorn, and feral animals such as horns present in the artwork. the characters appear to be sleeping or resting peacefully in this furry artwork style setting.", "tags_synthetic_categorized": "{\"characters_and_gender\": [\"female\"], \"animals_and_anthropomorphic_features\": [\"feral\", \"horn\", \"wings\", \"cutie_mark\", \"feathered_wings\", \"feathers\", \"anthro\"], \"miscellaneous\": [\"text\", \"magic\"], \"number_of_characters\": [\"solo\"], \"hairstyle\": [\"hair\", \"purple_hair\", \"two_tone_hair\", \"rainbow_hair\", \"blue_hair\", \"blonde_hair\", \"long_hair\", \"pink_hair\", \"eyelashes\"], \"actions_and_poses\": [\"eyes_closed\", \"water\", \"sleeping\", \"lying\", \"looking_at_viewer\"], \"clothing_and_accessories\": [\"fur\", \"clothing\"], \"background_and_setting\": [\"snow\", \"outside\", \"simple_background\", \"food\", \"night\", \"underwater\", \"dialogue\", \"snowing\", \"moon\", \"winter\", \"holidays\", \"sky\", \"inside\", \"detailed_background\", \"tree\"], \"colors\": [\"blue_body\", \"blue_feathers\", \"blue_fur\", \"white_body\", \"white_fur\", \"blue_eyes\"], \"furniture_and_objects\": [\"furniture\", \"bed\", \"pillow\", \"bedding\", \"plant\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"blush\"], \"body_and_body_parts\": [\"young\", \"breasts\", \"teeth\", \"bodily_fluids\"], \"species_or_animal_type\": [\"pony\", \"earth_pony\", \"mammal\", \"marine\", \"horse\", \"equid\", \"unicorn\", \"equine\"]}", "tags_ground_truth_categorized": "{\"emotions_and_expressions\": [\"angry\", \"scared\"], \"furniture_and_objects\": [\"bed\", \"furniture\", \"pillow\"], \"hairstyle\": [\"blonde_hair\", \"hair\", \"purple_hair\"], \"number_of_characters\": [\"duo\"], \"actions_and_poses\": [\"eyes_closed\", \"lying\", \"sleeping\"], \"clothing_and_accessories\": [\"eyeshadow\", \"fur\", \"makeup\"], \"animals_and_anthropomorphic_features\": [\"feathers\", \"feral\", \"horn\"], \"characters_and_gender\": [\"female\"], \"body_and_body_parts\": [\"freckles\"], \"colors\": [\"green_eyes\", \"white_body\", \"white_fur\"], \"miscellaneous\": [\"text\"], \"species_or_animal_type\": [\"earth_pony\", \"equid\", \"equine\", \"horse\", \"mammal\", \"pony\", \"unicorn\"]}", "tags_ground_truth_expanded": ["angry", "bed", "blonde_hair", "blue_eyes", "duo", "eyes_closed", "eyeshadow", "green_eyes", "hair", "lying", "makeup", "purple_hair", "sleeping", "text"]}
+{"id": 17482, "md5": "4f41d96bf3912080e56aec95973baee3", "caption_cogvlm": "The image showcases an anthropomorphic creature, possibly a wolf or a dog, with a spade tail and claws, playing a bass guitar. The creature is depicted in a dynamic pose, with its hair flowing and fingers poised on the guitar strings. The background is a blend of pastel colors, giving the artwork a dreamy and ethereal feel. The creature's attire appears torn, and it holds the guitar with a sense of passion and dedication.", "caption_llm_0": "a bipedal, anthropomorphic canine with claws and a tail, wearing clothing made of fur. The character is depicted playing the electric guitar while standing with eyes closed, holding the instrument in its membranous wings. The background features a smiling demonic figure holding an object.", "caption_llm_1": "A female canid, likely a demon or other mythical creature, clad in fur clothing and wearing pants. she stands with her eyes closed, playing the electric guitar while holding it with her fingers. her feet rest on the simple background as she smiles while performing. the membrane between her fingers allows for better control over the strings of the guitar.", "caption_llm_2": "a solo female canid, likely a demon or other type of canine mammal, playing music. She is clothed in fur and wears bottomwear in the form of pants. The background is simple, and she stands with her eyes closed while holding a musical instrument and an object. Her hair is visible, as are her feet and fingers with their membrane (anatomy).", "caption_llm_3": "A solo female anthropomorphic canine character, depicted as a bipedal creature with claws, tail, and membranous wings. she is wearing clothing made of fur and has bottomwear in the form of pants. her feet have toes that are covered by membranes. the background is simple, and she has a smile on her face while playing music.", "caption_llm_4": "A solo female canid character, likely a demon or other mythical creature, holding a plucked string instrument (such as a bass guitar or guitar) while displaying anthro features like 4 fingers and claws. the character has hair and is posed with the instrument in her hands. the scene is set against the backdrop of music, with various musical elements present.", "caption_llm_5": "A slender, anthropomorphic canine with four fingers and claws, holding a bass guitar. the canine has long hair and is depicted in a solo pose while playing the musical instrument.", "caption_llm_6": "A slender, anthropomorphic canine with four fingers and claws, holding a bass guitar. the canine has a spade-like tail and is depicted as female. the background consists of musical elements such as plucked string instruments and other string instruments.", "caption_llm_7": "A solo female canid character, likely a demon or other mythical creature, holding a bass guitar and plucked string instrument. she is clothed in torn clothing made of fur, with slim fingers. the scene depicts her playing music while holding the musical instruments in an intense pose.", "tags_synthetic_categorized": "{\"furniture_and_objects\": [\"musical_instrument\", \"string_instrument\", \"plucked_string_instrument\", \"guitar\", \"electric_guitar\", \"bass_guitar\"], \"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"anthro\", \"claws\", \"biped\", \"tail\", \"membranous_wings\", \"wings\"], \"characters_and_gender\": [\"male\"], \"clothing_and_accessories\": [\"clothing\", \"fur\", \"clothed\", \"bottomwear\", \"pants\"], \"actions_and_poses\": [\"playing_music\", \"playing_guitar\", \"holding_musical_instrument\", \"standing\", \"holding_object\", \"eyes_closed\"], \"hairstyle\": [\"hair\"], \"body_and_body_parts\": [\"feet\", \"fingers\", \"toes\", \"membrane_(anatomy)\", \"5_fingers\", \"tuft\"], \"miscellaneous\": [\"music\"], \"background_and_setting\": [\"simple_background\"], \"emotions_and_expressions\": [\"smile\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"mammal\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"4_fingers\", \"anthro\", \"claws\", \"spade_tail\"], \"furniture_and_objects\": [\"bass_guitar\", \"guitar\", \"musical_instrument\", \"plucked_string_instrument\", \"string_instrument\"], \"clothing_and_accessories\": [\"clothed\", \"clothing\", \"fur\", \"torn_clothing\"], \"characters_and_gender\": [\"female\"], \"body_and_body_parts\": [\"fingers\", \"slim\"], \"hairstyle\": [\"hair\"], \"actions_and_poses\": [\"holding_musical_instrument\", \"holding_object\"], \"miscellaneous\": [\"music\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"demon\", \"mammal\"]}", "tags_ground_truth_expanded": ["anthro", "bass_guitar", "canid", "canine", "claws", "clothed", "clothing", "fingers", "fur", "guitar", "hair", "holding_musical_instrument", "holding_object", "mammal", "music", "musical_instrument", "plucked_string_instrument", "solo", "spade_tail", "string_instrument", "tail", "torn_clothing"]}
+{"id": 2021552, "md5": "795c62b5d529f3758fe8f7de6062d2e6", "caption_cogvlm": "The image showcases two anthropomorphic characters. On the left is a rabbit-like creature dressed in a white shirt and black pants, standing with crossed arms. On the right is a fox-like character wearing blue overalls and a white shirt, looking towards the rabbit with a slightly open mouth. The background is a simple grey, and both characters have distinct features such as fur, facial markings, and claws.", "caption_llm_0": "a solo male character, clothed in a shirt and overalls, standing with crossed arms. He has white fur and grey body markings, with blue eyes. The background is simple, featuring either white or grey tones. The character is an arctic fox or canid species, displaying a half-closed eye expression while making eye contact with the viewer.", "caption_llm_1": "artwork of jack savage and skye (zootopia). a solo male character, clothed in a shirt and pants, standing with crossed arms. He has white fur and a grey body, with blue eyes. The background is simple, featuring either white or grey tones. The character is an anthropomorphic arctic fox with facial markings such as cheek tufts and head markings. He has fluffy tail tufts and pawpads on his feet. His ears are dipstick-shaped, and he has claws on his hands and feet. The character's tail is fluffy as well, with some tail markings present. In the scene, he appears to be looking at another person or object while holding something in his hand or pocket.", "caption_llm_2": "a solo male character, an anthropomorphic arctic fox with facial markings and a fluffy tail. The background is simple, featuring white and grey hues. The fox is standing with crossed arms, looking at another character while holding an object. It has blue eyes and grey fur on its body.", "caption_llm_3": "artwork of jack savage and skye (zootopia). a solo male character, standing with crossed arms and looking at another. He is fully clothed in a shirt and pants, with furry pawpads on his feet. The background is simple, white or grey. The character has a fluffy tail and facial tufts, as well as dipstick ears and head markings. He is holding an object while smiling with open mouth and half-closed eyes, making eye contact with the viewer. His species is an arctic fox or rabbit within the canid family.", "caption_llm_4": "Two anthropomorphic animals, one with a grey body and white fur, the other with a white body and grey fur. both have fluffy tails, cheek tufts, head markings, and facial tufts. they are clothed in overalls made of fur and wear shirts. one has crossed arms while the other looks away from another character. the background is simple with a grey color scheme.", "caption_llm_5": "Artwork of jack savage and skye (zootopia). a pair of anthropomorphic animals, one with a fluffy tail and cheek tufts, standing against a simple grey background. the other animal has facial markings and head tufts. both creatures have pawpads and toe claws, as well as fluffy fur in various shades of grey or white. they are depicted in crossed arms poses, looking away from each other while standing on their hind legs.", "caption_llm_6": "A duo of anthropomorphic animals, one an arctic fox and the other a rabbit. both are clothed in simple outfits - the fox in overalls and the rabbit in pants and a shirt. they stand with crossed arms, looking away from each other against a grey background. the fox has fluffy fur, cheek tufts, head markings, facial tufts, pawpads, claws on its toes and tail markings. the rabbit also has fluffy fur with head tufts and toe claws.", "caption_llm_7": "Artwork of jack savage and skye (zootopia). a pair of anthropomorphic animals, one with a fluffy tail and cheek tufts, standing against a simple grey background. the other animal has facial markings and head tufts. both creatures have pawpads and toe claws, as well as fluffy fur in various shades of grey or white. they are depicted in crossed arms poses, looking away from each other while standing on their hind legs.", "tags_synthetic_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"facial_markings\", \"cheek_tuft\", \"facial_tuft\", \"biped\", \"head_markings\", \"tail\", \"fluffy_tail\", \"fluffy\", \"head_tuft\", \"claws\", \"dipstick_ears\", \"size_difference\", \"dipstick_tail\", \"pawpads\", \"3_toes\", \"tail_markings\", \"neck_tuft\"], \"clothing_and_accessories\": [\"clothing\", \"clothed\", \"fur\", \"barefoot\", \"topwear\", \"shirt\", \"pants\", \"fully_clothed\", \"bottomwear\", \"overalls\", \"dress\"], \"number_of_characters\": [\"solo\"], \"characters_and_gender\": [\"male\"], \"background_and_setting\": [\"simple_background\", \"white_background\", \"grey_background\"], \"body_and_body_parts\": [\"feet\", \"tuft\", \"markings\", \"toes\", \"ear_markings\", \"toe_claws\", \"teeth\", \"butt\", \"breasts\", \"fingers\"], \"actions_and_poses\": [\"standing\", \"crossed_arms\", \"looking_at_another\", \"holding_object\", \"looking_at_viewer\", \"hand_on_hip\", \"hand_in_pocket\", \"sitting\", \"looking_back\", \"side_view\", \"pose\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"narrowed_eyes\", \"half-closed_eyes\", \"eye_contact\"], \"colors\": [\"white_fur\", \"white_body\", \"blue_eyes\", \"grey_body\", \"grey_fur\"], \"miscellaneous\": [\"text\"], \"hairstyle\": [\"hair\"], \"species_or_animal_type\": [\"mammal\", \"fox\", \"rabbit\", \"lagomorph\", \"canine\", \"arctic_fox\", \"felid\", \"leporid\", \"canid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"anthro\", \"cheek_tuft\", \"claws\", \"facial_markings\", \"facial_tuft\", \"fluffy\", \"fluffy_tail\", \"head_markings\", \"head_tuft\", \"pawpads\", \"tail\"], \"clothing_and_accessories\": [\"bottomwear\", \"clothed\", \"clothing\", \"fur\", \"overalls\", \"pants\", \"shirt\", \"topwear\"], \"actions_and_poses\": [\"crossed_arms\", \"looking_at_another\", \"looking_away\", \"standing\"], \"number_of_characters\": [\"duo\"], \"background_and_setting\": [\"grey_background\", \"simple_background\"], \"colors\": [\"grey_body\", \"grey_fur\", \"white_body\", \"white_fur\"], \"body_and_body_parts\": [\"markings\", \"toe_claws\", \"tuft\"], \"species_or_animal_type\": [\"arctic_fox\", \"canid\", \"canine\", \"fox\", \"lagomorph\", \"leporid\", \"mammal\", \"rabbit\"]}", "tags_ground_truth_expanded": ["anthro", "bottomwear", "canid", "canine", "claws", "clothed", "clothing", "crossed_arms", "duo", "facial_markings", "fox", "fur", "grey_background", "lagomorph", "leporid", "looking_at_another", "mammal", "overalls", "pants", "rabbit", "shirt", "simple_background", "standing", "topwear"]}
+{"id": 2034167, "md5": "00dc19bfce60793cedbd2bbcd8e01c44", "caption_cogvlm": "The image showcases a vibrant, animated character that appears to be a fusion of a canine and a humanoid. The character has striking blue eyes, a blue nose, and a purple body with white fur. The character's fur is adorned with vivid pink and blue stripes, and it has a playful, open-mouthed expression. The character's tail is long and curved, with a mix of blue and pink hues. The background is simple, allowing the character to be the focal point. The character appears to be in a dynamic pose, possibly mid-stride or jump.", "caption_llm_0": "a solo wolf character with ambiguous gender, standing and looking at the viewer. The wolf has white fur on its body and tail, while its face is covered in purple fur. Its eyes are blue, and it has red hair on its ears. The background is simple and white.", "caption_llm_1": "A solo feral wolf with paws, tongue, and stripes on its body. the wolf has 4 toes and is shown in a quadruped pose. its fur is white, while its body is red or orange. the animal has either purple or blue hair with eyelashes and red eyes. it's smiling with an open mouth and tongue out against a simple white background.", "caption_llm_2": "A solo character with ambiguous gender, standing and looking at the viewer. the character has fur clothing and accessories, along with various hairstyles such as purple, blue, red hair or eyelashes. the background is simple and white. \nthe character has a white body with either white fur or red fur depending on the species depicted - wolf, mammal, canid or canine (canis). they have either blue eyes or orange body/fur coloration. their feet show toes and they display an open mouth expression with tongue out pose.", "caption_llm_3": "A solo feral wolf with paws, tongue, and stripes. the wolf has 4 toes and a striped body with fur in various colors such as white, blue eyes, red body, purple fur, orange body and fur. it also has teeth and is standing while looking at the viewer. the wolf's hair is in different shades of purple or blue with eyelashes.", "caption_llm_4": "A solo female canine, likely a wolf, with 4 toes and feral characteristics. the background is simple, allowing the focus to be on the animal's open mouth and teeth. the artwork style is furry, with attention to detail in the animal's fur and expressions.", "caption_llm_5": "A solo, female canine with 4 toes and a purple body. the wolf has white fur and blue eyes, as well as a blue nose. she is depicted in a simple background with her fur serving as clothing or an accessory. her tongue is visible, and she has feet and teeth.", "caption_llm_6": "A solo female canine, likely a wolf, in a simple background. the animal is depicted with its mouth open and fur covering its body and accessories. its feet and teeth are also visible, as well as her toes.", "caption_llm_7": "A solo, female canine with a purple body and fur, white fur on her face and body, blue eyes, and a blue nose. she has 4 toes on each foot and displays an open-mouth expression. the background is simple. the character wears fur as clothing or accessories.", "tags_synthetic_categorized": "{\"number_of_characters\": [\"solo\"], \"animals_and_anthropomorphic_features\": [\"feral\", \"paws\", \"tongue\", \"4_toes\", \"stripes\", \"striped_body\", \"striped_fur\", \"quadruped\"], \"clothing_and_accessories\": [\"fur\"], \"background_and_setting\": [\"simple_background\", \"white_background\"], \"hairstyle\": [\"hair\", \"purple_hair\", \"eyelashes\", \"blue_hair\", \"red_hair\"], \"colors\": [\"white_fur\", \"white_body\", \"blue_eyes\", \"red_body\", \"red_fur\", \"purple_fur\", \"orange_body\", \"orange_fur\", \"purple_body\"], \"body_and_body_parts\": [\"toes\", \"feet\", \"teeth\", \"fingers\", \"ear_piercing\", \"eyebrows\"], \"emotions_and_expressions\": [\"smile\", \"open_mouth\", \"tongue_out\"], \"characters_and_gender\": [\"ambiguous_gender\"], \"actions_and_poses\": [\"standing\", \"looking_at_viewer\"], \"species_or_animal_type\": [\"mammal\", \"wolf\", \"canis\", \"canine\", \"canid\"]}", "tags_ground_truth_categorized": "{\"animals_and_anthropomorphic_features\": [\"4_toes\", \"feral\", \"tongue\"], \"colors\": [\"blue_eyes\", \"blue_nose\", \"purple_body\", \"purple_fur\", \"white_body\", \"white_fur\"], \"body_and_body_parts\": [\"feet\", \"teeth\", \"toes\"], \"characters_and_gender\": [\"female\"], \"clothing_and_accessories\": [\"fur\"], \"emotions_and_expressions\": [\"open_mouth\"], \"background_and_setting\": [\"simple_background\"], \"number_of_characters\": [\"solo\"], \"species_or_animal_type\": [\"canid\", \"canine\", \"canis\", \"mammal\", \"wolf\"]}", "tags_ground_truth_expanded": ["blue_eyes", "blue_nose", "canid", "canine", "fur", "mammal", "open_mouth", "purple_body", "simple_background", "solo", "white_body", "white_fur"]}

psq_rag/retrieval/psq_retrieval.py CHANGED Viewed

@@ -164,6 +164,7 @@ def psq_candidates_from_rewrite_phrases(
     per_phrase_k: int = 50,
     per_phrase_final_k: int = 10,
     global_k: int = 300,
     verbose: bool = False,
 ) -> Union[List[Candidate], Tuple[List[Candidate], List[Dict[str, Any]]]]:
     head_stopwords = {
@@ -249,6 +250,7 @@ def psq_candidates_from_rewrite_phrases(
     phrase_best_tokens: Dict[str, Dict[str, str]] = {}
     phrase_context_imputed: Dict[str, Dict[str, bool]] = {}
     phrase_reports: List[Dict[str, Any]] = []
     for phrase in final_phrases:
         lookup = phrase.replace(" ", "_")
@@ -414,6 +416,11 @@ def psq_candidates_from_rewrite_phrases(
             scored_rows = scored_rows[:per_phrase_final_k]
         per_phrase_scored[phrase] = scored_rows
         phrase_context_imputed[phrase] = context_imputed_by_tag
         for tag, score_fasttext, score_context, score_combined in scored_rows:
             existing = merged_by_tag.get(tag)
@@ -475,6 +482,10 @@ def psq_candidates_from_rewrite_phrases(
     merged_candidates.sort(key=lambda c: c.score_combined, reverse=True)
     merged_candidates = merged_candidates[:global_k]
     return (merged_candidates, phrase_reports) if verbose else merged_candidates

     per_phrase_k: int = 50,
     per_phrase_final_k: int = 10,
     global_k: int = 300,
+    return_phrase_ranks: bool = False,
     verbose: bool = False,
 ) -> Union[List[Candidate], Tuple[List[Candidate], List[Dict[str, Any]]]]:
     head_stopwords = {
     phrase_best_tokens: Dict[str, Dict[str, str]] = {}
     phrase_context_imputed: Dict[str, Dict[str, bool]] = {}
     phrase_reports: List[Dict[str, Any]] = []
+    phrase_rank_by_tag: Dict[str, int] = {}
     for phrase in final_phrases:
         lookup = phrase.replace(" ", "_")
             scored_rows = scored_rows[:per_phrase_final_k]
         per_phrase_scored[phrase] = scored_rows
         phrase_context_imputed[phrase] = context_imputed_by_tag
+        if return_phrase_ranks:
+            for rank, (tag, _score_fasttext, _score_context, _score_combined) in enumerate(scored_rows, start=1):
+                prev = phrase_rank_by_tag.get(tag)
+                if prev is None or rank < prev:
+                    phrase_rank_by_tag[tag] = rank
         for tag, score_fasttext, score_context, score_combined in scored_rows:
             existing = merged_by_tag.get(tag)
     merged_candidates.sort(key=lambda c: c.score_combined, reverse=True)
     merged_candidates = merged_candidates[:global_k]
+    if return_phrase_ranks:
+        if verbose:
+            return (merged_candidates, phrase_reports, phrase_rank_by_tag)
+        return (merged_candidates, phrase_rank_by_tag)
     return (merged_candidates, phrase_reports) if verbose else merged_candidates

psq_rag/tagging/categorized_suggestions.py CHANGED Viewed

@@ -37,21 +37,27 @@ class CategorizedTagSuggestions:
     categories: Dict[str, TagCategory]            # All category definitions
-def load_categories(checklist_path: Optional[Path] = None) -> Dict[str, TagCategory]:
     """
     Load and parse category definitions from checklist.
     Args:
         checklist_path: Path to checklist file. If None, uses default location.
-    Returns:
-        Dict mapping category_name -> TagCategory
-    """
-    if checklist_path is None:
-        # Try to find it in the git repo from the other branch
-        import subprocess
-        try:
-            result = subprocess.run(
                 ['git', 'show', 'origin/claude/prompt-squirrel-rag-3PZn7:tagging_checklist.txt'],
                 capture_output=True,
                 text=True,

     categories: Dict[str, TagCategory]            # All category definitions
+def load_categories(checklist_path: Optional[Path] = None) -> Dict[str, TagCategory]:
     """
     Load and parse category definitions from checklist.
     Args:
         checklist_path: Path to checklist file. If None, uses default location.
+    Returns:
+        Dict mapping category_name -> TagCategory
+    """
+    if checklist_path is None:
+        repo_root = Path(__file__).resolve().parents[2]
+        local_checklist = repo_root / "tagging_checklist.txt"
+        if local_checklist.exists():
+            checklist_path = local_checklist
+    if checklist_path is None:
+        # Try to find it in the git repo from the other branch
+        import subprocess
+        try:
+            result = subprocess.run(
                 ['git', 'show', 'origin/claude/prompt-squirrel-rag-3PZn7:tagging_checklist.txt'],
                 capture_output=True,
                 text=True,

scripts/analyze_caption_evident_audit.py ADDED Viewed

	@@ -0,0 +1,130 @@

+"""
+Analyze caption-evident tag recall against retrieved tags.
+Compares tags marked caption-evident to retrieved tags (optionally + structural),
+with optional implication expansion on both sets.
+"""
+from __future__ import annotations
+import argparse
+import json
+from collections import Counter
+from pathlib import Path
+from typing import Dict, Iterable, Set
+import sys
+_REPO_ROOT = Path(__file__).resolve().parents[1]
+if str(_REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(_REPO_ROOT))
+from psq_rag.retrieval.state import expand_tags_via_implications
+def _load_evident(path: Path) -> Dict[int, Set[str]]:
+    by_id: Dict[int, Set[str]] = {}
+    with path.open("r", encoding="utf-8") as f:
+        for line in f:
+            row = json.loads(line)
+            sid = row.get("id")
+            if sid is None:
+                continue
+            tags = set(row.get("tags_ground_truth_expanded") or [])
+            if tags:
+                by_id[int(sid)] = tags
+    return by_id
+def _load_eval_detail(path: Path) -> Dict[int, dict]:
+    rows = {}
+    with path.open("r", encoding="utf-8") as f:
+        for line in f:
+            row = json.loads(line)
+            if row.get("_meta"):
+                continue
+            rows[int(row["sample_id"])] = row
+    return rows
+def _expand(tags: Iterable[str]) -> Set[str]:
+    expanded, _ = expand_tags_via_implications(set(tags))
+    return expanded
+def main() -> int:
+    ap = argparse.ArgumentParser(description="Caption-evident audit vs retrieval.")
+    ap.add_argument("--evident", type=str, required=True,
+                    help="Caption-evident JSONL (tags_ground_truth_expanded set to evident tags).")
+    ap.add_argument("--detail", type=str, required=True,
+                    help="Eval detail JSONL (from eval_pipeline.py).")
+    ap.add_argument("--no-structural", action="store_true",
+                    help="Do not count structural tags as retrieved.")
+    ap.add_argument("--expand-implications", action="store_true",
+                    help="Expand both evident and retrieved tags via implications.")
+    args = ap.parse_args()
+    evident_by_id = _load_evident(Path(args.evident))
+    detail_by_id = _load_eval_detail(Path(args.detail))
+    hit_counter = Counter()
+    miss_counter = Counter()
+    present_counter = Counter()
+    print("ID,evident,retrieved,overlap,recall_evident,precision_evident,missing_evident,extra_not_evident,complete_overlap")
+    total_evident = total_retrieved = total_overlap = 0
+    for sid in sorted(evident_by_id):
+        ev = set(evident_by_id[sid])
+        detail = detail_by_id.get(sid)
+        if detail is None:
+            continue
+        retrieved = set(detail.get("retrieved_tags", []))
+        if not args.no_structural:
+            retrieved |= set(detail.get("structural_tags", []))
+        if args.expand_implications:
+            ev = _expand(ev)
+            retrieved = _expand(retrieved)
+        overlap = ev & retrieved
+        missing = ev - retrieved
+        extra = retrieved - ev
+        for t in ev:
+            present_counter[t] += 1
+            if t in retrieved:
+                hit_counter[t] += 1
+            else:
+                miss_counter[t] += 1
+        recall = len(overlap) / len(ev) if ev else 0.0
+        precision = len(overlap) / len(retrieved) if retrieved else 0.0
+        total_evident += len(ev)
+        total_retrieved += len(retrieved)
+        total_overlap += len(overlap)
+        complete = len(missing) == 0
+        print(f"{sid},{len(ev)},{len(retrieved)},{len(overlap)},{recall:.3f},{precision:.3f},{len(missing)},{len(extra)},{complete}")
+    print(f"TOTAL,{total_evident},{total_retrieved},{total_overlap},{(total_overlap/total_evident):.3f},{(total_overlap/total_retrieved):.3f},{total_evident-total_overlap},{total_retrieved-total_overlap},N/A")
+    print("\nMOST MISSED (caption-evident tags not retrieved):")
+    for tag, cnt in miss_counter.most_common(20):
+        present = present_counter[tag]
+        print(f"  {tag:25s} missed {cnt}/{present} (present {present}/10)")
+    print("\nMOST FOUND (caption-evident tags retrieved):")
+    for tag, cnt in hit_counter.most_common(20):
+        present = present_counter[tag]
+        print(f"  {tag:25s} found {cnt}/{present} (present {present}/10)")
+    always_found = [t for t, c in hit_counter.items() if c == present_counter[t]]
+    if always_found:
+        print("\nALWAYS FOUND WHEN EVIDENT:")
+        for t in sorted(always_found):
+            print(f"  {t}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/analyze_threshold_grid.py ADDED Viewed

	@@ -0,0 +1,407 @@

+"""
+Analyze post-hoc retrieval score thresholds on Stage 3 selections.
+This script re-scores evaluation outputs by removing Stage 3 selections
+with retrieval score <= threshold, then recomputing metrics. This is an
+approximation that avoids re-running the LLMs.
+"""
+from __future__ import annotations
+import argparse
+import json
+import sys
+from pathlib import Path
+from typing import Dict, Iterable, List, Set, Tuple
+_REPO_ROOT = Path(__file__).resolve().parents[1]
+if str(_REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(_REPO_ROOT))
+import csv
+from collections import defaultdict
+from psq_rag.retrieval.state import expand_tags_via_implications, get_leaf_tags
+from scripts.eval_pipeline import _EVAL_EXCLUDED_TAGS  # reuse eval exclusions
+def _compute_metrics(predicted: Set[str], ground_truth: Set[str]) -> Tuple[float, float, float]:
+    if not predicted and not ground_truth:
+        return 1.0, 1.0, 1.0
+    if not predicted:
+        return 0.0, 0.0, 0.0
+    if not ground_truth:
+        return 0.0, 0.0, 0.0
+    tp = len(predicted & ground_truth)
+    precision = tp / len(predicted)
+    recall = tp / len(ground_truth)
+    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0.0
+    return precision, recall, f1
+def _load_rows(path: Path) -> Tuple[dict, List[dict]]:
+    meta = None
+    rows = []
+    with path.open("r", encoding="utf-8") as f:
+        for line in f:
+            row = json.loads(line)
+            if row.get("_meta"):
+                meta = row
+                continue
+            rows.append(row)
+    if meta is None:
+        meta = {}
+    return meta, rows
+def _load_tag_db(repo_root: Path) -> Dict[str, int]:
+    tag_type: Dict[str, int] = {}
+    db_path = repo_root / "fluffyrock_3m.csv"
+    if not db_path.exists():
+        return tag_type
+    with db_path.open("r", encoding="utf-8") as f:
+        for row in csv.reader(f):
+            if len(row) < 2:
+                continue
+            tag = row[0].strip()
+            try:
+                tid = int(row[1]) if row[1].strip() else -1
+            except ValueError:
+                tid = -1
+            tag_type[tag] = tid
+    return tag_type
+TYPE_ID_NAMES = {
+    0: "general",
+    1: "artist",
+    3: "copyright",
+    4: "character",
+    5: "species",
+    7: "meta",
+}
+_TAXONOMY = frozenset({
+    "mammal","canid","canine","canis","felid","feline","felis","ursine","cervid","bovid","equid","equine",
+    "mustelid","procyonid","reptile","scalie","avian","bird","fish","marine","arthropod","insect","arachnid",
+    "amphibian","primate","rodent","lagomorph","leporid","galliform","gallus_(genus)","phasianid","passerine",
+    "oscine","dinosaur","theropod","cetacean","pinniped","chiroptera","marsupial","monotreme","mephitid",
+    "suid","suina"
+})
+_BODY_PLAN = frozenset({"anthro","feral","biped","quadruped","taur","humanoid","semi-anthro","animatronic","robot","machine","plushie","kemono"})
+_POSE = frozenset({
+    "solo","duo","group","trio","standing","sitting","lying","running","walking","flying","swimming","crouching",
+    "kneeling","jumping","looking_at_viewer","looking_away","looking_back","looking_up","looking_down",
+    "looking_aside","front_view","side_view","back_view","three-quarter_view","from_above","from_below","close-up",
+    "portrait","full-length_portrait","hand_on_hip","arms_crossed","all_fours","on_back","on_side","crossed_arms"
+})
+def _categorize(tag: str, tag_type: Dict[str, int]) -> str:
+    tid = tag_type.get(tag, -1)
+    tn = TYPE_ID_NAMES.get(tid, "unknown")
+    if tn == "species":
+        return "species"
+    if tn in ("artist", "copyright", "character", "meta"):
+        return tn
+    if tag in _TAXONOMY:
+        return "taxonomy"
+    if tag in _BODY_PLAN:
+        return "body_plan"
+    if tag in _POSE:
+        return "pose/composition"
+    if tag.startswith(tuple(str(i) + "_" for i in range(10))) and any(
+        tag.endswith(s) for s in ("fingers","toes","horns","arms","legs","eyes","ears","wings","tails")
+    ):
+        return "count/anatomy"
+    if tag in ("male","female","intersex","ambiguous_gender","andromorph","gynomorph"):
+        return "gender"
+    if any(k in tag for k in (
+        "clothing","clothed","topwear","bottomwear","legwear","handwear","headwear","footwear","shirt","pants",
+        "shorts","dress","skirt","jacket","coat","hat","boots","shoes","gloves","socks","stockings","belt",
+        "collar","scarf","cape","armor","suit","uniform","costume","outfit"
+    )):
+        return "clothing"
+    if any(tag.startswith(c + "_") for c in (
+        "red","blue","green","yellow","orange","purple","pink","black","white","grey","gray","brown","tan","cream",
+        "gold","silver","teal","cyan","magenta"
+    )):
+        return "color/marking"
+    if tag.endswith("_coloring") or tag.endswith("_markings") or tag == "markings":
+        return "color/marking"
+    if "hair" in tag:
+        return "hair"
+    if any(k in tag for k in (
+        "muscle","belly","chest","abs","breast","butt","tail","wing","horn","ear","eye","teeth","fang","claw",
+        "paw","hoof","snout","muzzle","tongue","fur","scales","feather","tuft","fluff","mane"
+    )):
+        return "body/anatomy"
+    if any(k in tag for k in (
+        "smile","grin","frown","expression","blush","angry","happy","sad","crying","laughing","open_mouth",
+        "closed_eyes","wink"
+    )):
+        return "expression"
+    return "other_general"
+def _iter_thresholds(values: Iterable[float], min_v: float, max_v: float, step: float) -> List[float]:
+    if values:
+        return sorted(set(values))
+    thresholds = []
+    v = min_v
+    while v <= max_v + 1e-9:
+        thresholds.append(round(v, 4))
+        v += step
+    return thresholds
+def _sparkline(values: List[float], width: int = 50) -> str:
+    if not values:
+        return ""
+    charset = " .:-=+*#%@"
+    vmin = min(values)
+    vmax = max(values)
+    if vmax == vmin:
+        return charset[0] * min(width, len(values))
+    out = []
+    for v in values:
+        norm = (v - vmin) / (vmax - vmin)
+        idx = int(round(norm * (len(charset) - 1)))
+        out.append(charset[idx])
+    return "".join(out)
+def analyze(
+    path: Path,
+    thresholds: List[float],
+    expand_implications: bool,
+    category_curves: bool,
+    mode: str,
+) -> Tuple[List[dict], List[dict]]:
+    meta, rows = _load_rows(path)
+    expand = expand_implications or bool(meta.get("expand_implications"))
+    tag_type = _load_tag_db(_REPO_ROOT) if category_curves else {}
+    results = []
+    category_rows = []
+    for thr in thresholds:
+        total_p = total_r = total_f1 = 0.0
+        total_lp = total_lr = total_lf1 = 0.0
+        total_sel = 0
+        total_gt = 0
+        total_oracle_r = 0.0
+        total_oracle_f1 = 0.0
+        n = 0
+        if category_curves:
+            cat_totals = defaultdict(lambda: {"p": 0.0, "r": 0.0, "f1": 0.0, "n": 0})
+        for row in rows:
+            gt = set(row.get("ground_truth_tags", []))
+            gt -= _EVAL_EXCLUDED_TAGS
+            stage3_selected = set(row.get("stage3_selected", []))
+            stage3_scores: Dict[str, float] = row.get("stage3_selected_scores", {}) or {}
+            stage3_ranks: Dict[str, int] = row.get("stage3_selected_ranks", {}) or {}
+            stage3_phrase_ranks: Dict[str, int] = row.get("stage3_selected_phrase_ranks", {}) or {}
+            structural = set(row.get("structural", []))
+            # Remove low-scoring Stage 3 selections.
+            filtered_stage3 = set()
+            for t in stage3_selected:
+                if mode == "rank":
+                    rank = stage3_ranks.get(t)
+                    if rank is None:
+                        filtered_stage3.add(t)
+                    elif rank <= int(thr):
+                        filtered_stage3.add(t)
+                elif mode == "phrase_rank":
+                    rank = stage3_phrase_ranks.get(t)
+                    if rank is None:
+                        filtered_stage3.add(t)
+                    elif rank <= int(thr):
+                        filtered_stage3.add(t)
+                else:
+                    score = stage3_scores.get(t)
+                    if score is None:
+                        filtered_stage3.add(t)
+                    elif score > thr:
+                        filtered_stage3.add(t)
+            available = filtered_stage3 | structural
+            if expand and available:
+                available, _ = expand_tags_via_implications(available)
+            selected = available
+            selected -= _EVAL_EXCLUDED_TAGS
+            p, r, f1 = _compute_metrics(selected, gt)
+            total_p += p
+            total_r += r
+            total_f1 += f1
+            leaf_sel = get_leaf_tags(selected)
+            leaf_gt = get_leaf_tags(gt)
+            lp, lr, lf1 = _compute_metrics(leaf_sel, leaf_gt)
+            total_lp += lp
+            total_lr += lr
+            total_lf1 += lf1
+            # Oracle max: perfect selection from available tags.
+            if gt:
+                oracle_r = len(gt & available) / len(gt)
+                oracle_f1 = (2 * oracle_r / (1 + oracle_r)) if oracle_r > 0 else 0.0
+            else:
+                oracle_r = 1.0
+                oracle_f1 = 1.0
+            total_oracle_r += oracle_r
+            total_oracle_f1 += oracle_f1
+            if category_curves:
+                cat_gt: Dict[str, Set[str]] = defaultdict(set)
+                cat_sel: Dict[str, Set[str]] = defaultdict(set)
+                for t in gt:
+                    cat_gt[_categorize(t, tag_type)].add(t)
+                for t in selected:
+                    cat_sel[_categorize(t, tag_type)].add(t)
+                for cat in set(cat_gt.keys()) | set(cat_sel.keys()):
+                    cp, cr, cf1 = _compute_metrics(cat_sel.get(cat, set()), cat_gt.get(cat, set()))
+                    cat_totals[cat]["p"] += cp
+                    cat_totals[cat]["r"] += cr
+                    cat_totals[cat]["f1"] += cf1
+                    cat_totals[cat]["n"] += 1
+            total_sel += len(selected)
+            total_gt += len(gt)
+            n += 1
+        if n == 0:
+            continue
+        results.append({
+            "threshold": thr,
+            "P": total_p / n,
+            "R": total_r / n,
+            "F1": total_f1 / n,
+            "leaf_P": total_lp / n,
+            "leaf_R": total_lr / n,
+            "leaf_F1": total_lf1 / n,
+            "avg_selected": total_sel / n,
+            "avg_gt": total_gt / n,
+            "oracle_R": total_oracle_r / n,
+            "oracle_F1": total_oracle_f1 / n,
+        })
+        if category_curves:
+            for cat, stats in sorted(cat_totals.items()):
+                if stats["n"] == 0:
+                    continue
+                category_rows.append({
+                    "threshold": thr,
+                    "category": cat,
+                    "P": stats["p"] / stats["n"],
+                    "R": stats["r"] / stats["n"],
+                    "F1": stats["f1"] / stats["n"],
+                })
+    return results, category_rows
+def main() -> int:
+    ap = argparse.ArgumentParser(description="Analyze post-hoc Stage3 score thresholds.")
+    ap.add_argument("path", nargs="?", type=str, default=None,
+                    help="Path to compact eval JSONL (default: latest in data/eval_results)")
+    ap.add_argument("--min", dest="min_v", type=float, default=0.0, help="Min threshold")
+    ap.add_argument("--max", dest="max_v", type=float, default=1.0, help="Max threshold")
+    ap.add_argument("--step", type=float, default=0.05, help="Threshold step size")
+    ap.add_argument("--values", type=str, default="",
+                    help="Comma-separated explicit thresholds (overrides min/max/step)")
+    ap.add_argument("--mode", choices=["score", "rank", "phrase_rank"], default="score",
+                    help="Threshold mode: score (default), rank (global), or phrase_rank (per-phrase)")
+    ap.add_argument("--rank-min", type=int, default=1, help="Min rank threshold (rank mode)")
+    ap.add_argument("--rank-max", type=int, default=300, help="Max rank threshold (rank mode)")
+    ap.add_argument("--rank-step", type=int, default=10, help="Rank threshold step (rank mode)")
+    ap.add_argument("--no-expand-implications", action="store_true",
+                    help="Do not re-expand tags via implications")
+    ap.add_argument("--category-curves", action="store_true",
+                    help="Emit category-level precision/recall/F1 curves")
+    args = ap.parse_args()
+    if args.path:
+        path = Path(args.path)
+    else:
+        path = sorted((_REPO_ROOT / "data" / "eval_results").glob("eval_*.jsonl"))[-1]
+    values = []
+    if args.values.strip():
+        values = [float(v.strip()) for v in args.values.split(",") if v.strip()]
+    if args.mode in ("rank", "phrase_rank"):
+        if values:
+            thresholds = sorted(set(int(v) for v in values))
+        else:
+            thresholds = list(range(args.rank_min, args.rank_max + 1, args.rank_step))
+    else:
+        thresholds = _iter_thresholds(values, args.min_v, args.max_v, args.step)
+    results, category_rows = analyze(
+        path,
+        thresholds,
+        expand_implications=not args.no_expand_implications,
+        category_curves=args.category_curves,
+        mode=args.mode,
+    )
+    # Write CSV to stdout
+    if args.mode in ("rank", "phrase_rank"):
+        print("rank_max,P,R,F1,leaf_P,leaf_R,leaf_F1,avg_selected,avg_gt,oracle_R,oracle_F1")
+    else:
+        print("threshold,P,R,F1,leaf_P,leaf_R,leaf_F1,avg_selected,avg_gt,oracle_R,oracle_F1")
+    for row in results:
+        if args.mode in ("rank", "phrase_rank"):
+            print(
+                f"{int(row['threshold'])},{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f},"
+                f"{row['leaf_P']:.4f},{row['leaf_R']:.4f},{row['leaf_F1']:.4f},"
+                f"{row['avg_selected']:.2f},{row['avg_gt']:.2f},"
+                f"{row['oracle_R']:.4f},{row['oracle_F1']:.4f}"
+            )
+        else:
+            print(
+                f"{row['threshold']:.4f},{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f},"
+                f"{row['leaf_P']:.4f},{row['leaf_R']:.4f},{row['leaf_F1']:.4f},"
+                f"{row['avg_selected']:.2f},{row['avg_gt']:.2f},"
+                f"{row['oracle_R']:.4f},{row['oracle_F1']:.4f}"
+            )
+    # ASCII sparkline graph for core metrics
+    p_vals = [r["P"] for r in results]
+    r_vals = [r["R"] for r in results]
+    f1_vals = [r["F1"] for r in results]
+    print("\nP  " + _sparkline(p_vals))
+    print("R  " + _sparkline(r_vals))
+    print("F1 " + _sparkline(f1_vals))
+    if args.category_curves and category_rows:
+        print("\nCATEGORY_CURVES")
+        if args.mode in ("rank", "phrase_rank"):
+            print("rank_max,category,P,R,F1")
+        else:
+            print("threshold,category,P,R,F1")
+        for row in category_rows:
+            if args.mode in ("rank", "phrase_rank"):
+                print(
+                    f"{int(row['threshold'])},{row['category']},"
+                    f"{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f}"
+                )
+            else:
+                print(
+                    f"{row['threshold']:.4f},{row['category']},"
+                    f"{row['P']:.4f},{row['R']:.4f},{row['F1']:.4f}"
+                )
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

scripts/eval_pipeline.py CHANGED Viewed

@@ -39,23 +39,33 @@ Requires:
 from __future__ import annotations
-import argparse
-import json
-import os
-import random
-import sys
-import threading
-import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Set, Tuple
-_REPO_ROOT = Path(__file__).resolve().parents[1]
-if str(_REPO_ROOT) not in sys.path:
-    sys.path.insert(0, str(_REPO_ROOT))
-os.chdir(_REPO_ROOT)
 EVAL_DATA_PATH = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl"
 EVAL_DATA_PATH_RAW = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000.jsonl"
@@ -124,11 +134,15 @@ class SampleResult:
     # Stage 2
     retrieved_tags: Set[str] = field(default_factory=set)
     retrieval_recall: float = 0.0
-    # Stage 3 — overall
-    selected_tags: Set[str] = field(default_factory=set)
-    selection_precision: float = 0.0
-    selection_recall: float = 0.0
-    selection_f1: float = 0.0
     # Stage 3 — character tags only
     gt_character_tags: Set[str] = field(default_factory=set)
     selected_character_tags: Set[str] = field(default_factory=set)
@@ -190,16 +204,17 @@ def _compute_metrics(predicted: Set[str], ground_truth: Set[str]) -> Tuple[float
     return precision, recall, f1
-def _process_one_sample(
     sample: Dict[str, Any],
     index: int,
     total: int,
     skip_rewrite: bool,
     allow_nsfw: bool,
-    mode: str,
-    chunk_size: int,
-    per_phrase_k: int,
-    temperature: float,
     max_tokens: int,
     verbose: bool,
     print_lock: threading.Lock,
@@ -258,18 +273,24 @@ def _process_one_sample(
         # --- Stage 2: Retrieval ---
         t0 = time.time()
-        retrieval_result = psq_candidates_from_rewrite_phrases(
-            rewrite_phrases=result.rewrite_phrases,
-            allow_nsfw_tags=allow_nsfw,
-            global_k=300,
-            verbose=False,
-        )
         result.stage2_time = time.time() - t0
-        if isinstance(retrieval_result, tuple):
-            candidates, _ = retrieval_result
-        else:
-            candidates = retrieval_result
         result.retrieved_tags = {c.tag for c in candidates}
         if gt_tags:
@@ -294,16 +315,22 @@ def _process_one_sample(
         )
         result.stage3_time = time.time() - t0
-        result.selected_tags = {candidates[idx].tag for idx in picked_indices} if picked_indices else set()
-        # Build per-tag evidence from Stage 3 selection
-        for idx in picked_indices:
-            tag = candidates[idx].tag
-            result.tag_evidence[tag] = {
-                "source": "stage3",
-                "why": tag_why.get(tag, "unknown"),
-                "retrieval_score": round(candidates[idx].score_combined, 4),
-            }
         # Why distribution
         why_counts: Dict[str, int] = {}
@@ -457,15 +484,16 @@ def _prewarm_retrieval_assets() -> None:
     print(f"  Assets loaded in {time.time() - t0:.1f}s")
-def run_eval(
     n_samples: int = 20,
     caption_field: str = "caption_cogvlm",
     skip_rewrite: bool = False,
     allow_nsfw: bool = False,
     mode: str = "chunked_map_union",
-    chunk_size: int = 60,
-    per_phrase_k: int = 2,
-    temperature: float = 0.0,
     max_tokens: int = 512,
     verbose: bool = False,
     shuffle: bool = True,
@@ -473,11 +501,14 @@ def run_eval(
     workers: int = 1,
     min_why: Optional[str] = "strong_implied",
     expand_implications: bool = False,
-    infer_structural: bool = False,
-) -> List[SampleResult]:
-    # Load eval samples — prefer expanded file, fall back to raw
-    eval_path = EVAL_DATA_PATH
     if not eval_path.is_file():
         eval_path = EVAL_DATA_PATH_RAW
         if not eval_path.is_file():
@@ -500,14 +531,17 @@ def run_eval(
                 using_expanded = True
             else:
                 gt_tags = _flatten_ground_truth_tags(row.get("tags_ground_truth_categorized", ""))
-            if not gt_tags:
-                continue
-            # Remove eval-excluded tags from GT
-            gt_tags -= _EVAL_EXCLUDED_TAGS
-            all_samples.append({
-                "id": row.get("id", row.get("row_id", len(all_samples))),
-                "caption": caption.strip(),
-                "gt_tags": gt_tags,
             })
     if using_expanded:
         print("Using implication-expanded ground truth")
@@ -534,13 +568,13 @@ def run_eval(
         # Sequential mode (original behavior)
         results: List[SampleResult] = []
         for i, sample in enumerate(samples):
-            result = _process_one_sample(
-                sample, i, total,
-                skip_rewrite, allow_nsfw, mode, chunk_size,
-                per_phrase_k, temperature, max_tokens, verbose,
-                print_lock, min_why, expand_implications,
-                infer_structural,
-            )
             results.append(result)
     else:
         # Parallel mode
@@ -551,13 +585,13 @@ def run_eval(
         with ThreadPoolExecutor(max_workers=workers) as executor:
             futures = {
                 executor.submit(
-                    _process_one_sample,
-                    sample, i, total,
-                    skip_rewrite, allow_nsfw, mode, chunk_size,
-                    per_phrase_k, temperature, max_tokens, verbose,
-                    print_lock, min_why, expand_implications,
-                    infer_structural,
-                ): i
                 for i, sample in enumerate(samples)
             }
             for future in as_completed(futures):
@@ -784,8 +818,9 @@ def print_summary(results: List[SampleResult]) -> None:
     print("=" * 70)
-def main(argv=None) -> int:
-    ap = argparse.ArgumentParser(description="End-to-end pipeline evaluation")
     ap.add_argument("--n", type=int, default=20, help="Number of samples to evaluate")
     ap.add_argument("--caption-field", default="caption_cogvlm",
                     choices=["caption_cogvlm", "caption_llm_0", "caption_llm_1",
@@ -797,8 +832,10 @@ def main(argv=None) -> int:
     ap.add_argument("--allow-nsfw", action="store_true", help="Allow NSFW tags")
     ap.add_argument("--mode", default="chunked_map_union",
                     choices=["single_shot", "chunked_map_union"])
-    ap.add_argument("--chunk-size", type=int, default=60)
-    ap.add_argument("--per-phrase-k", type=int, default=2)
     ap.add_argument("--temperature", type=float, default=0.0)
     ap.add_argument("--max-tokens", type=int, default=512)
     ap.add_argument("--verbose", "-v", action="store_true", help="Show per-call Stage 3 logs")
@@ -830,10 +867,11 @@ def main(argv=None) -> int:
         caption_field=args.caption_field,
         skip_rewrite=args.skip_rewrite,
         allow_nsfw=args.allow_nsfw,
-        mode=args.mode,
-        chunk_size=args.chunk_size,
-        per_phrase_k=args.per_phrase_k,
-        temperature=args.temperature,
         max_tokens=args.max_tokens,
         verbose=args.verbose,
         shuffle=args.shuffle,
@@ -870,10 +908,11 @@ def main(argv=None) -> int:
         "caption_field": args.caption_field,
         "skip_rewrite": args.skip_rewrite,
         "allow_nsfw": args.allow_nsfw,
-        "mode": args.mode,
-        "chunk_size": args.chunk_size,
-        "per_phrase_k": args.per_phrase_k,
-        "temperature": args.temperature,
         "shuffle": args.shuffle,
         "seed": args.seed,
         "workers": args.workers,
@@ -926,13 +965,17 @@ def main(argv=None) -> int:
                 # Diff sets (small — only the errors, not the full lists)
                 "missed": missed_tags,
                 "extra": extra_tags,
-                # Full tag lists (needed for categorized evaluation)
-                "ground_truth_tags": sorted(r.ground_truth_tags),
-                "selected_tags": sorted(r.selected_tags),
-                # Evidence for extra tags (why did these false positives get through?)
-                "extra_evidence": {t: r.tag_evidence.get(t, {}) for t in extra_tags},
-                # Structural tags inferred
-                "structural": r.structural_tags,
                 # Timing
                 "t1": round(r.stage1_time, 2),
                 "t2": round(r.stage2_time, 2),
@@ -953,9 +996,13 @@ def main(argv=None) -> int:
                 "caption": r.caption,
                 "ground_truth_tags": sorted(r.ground_truth_tags),
                 "rewrite_phrases": r.rewrite_phrases,
-                "retrieved_tags": sorted(r.retrieved_tags),
-                "selected_tags": sorted(r.selected_tags),
-                "implied_tags": sorted(r.implied_tags),
                 "structural_tags": r.structural_tags,
                 "categorized_suggestions": r.categorized_suggestions,
                 "why_counts": r.why_counts,

 from __future__ import annotations
+import argparse
+import json
+import os
+import random
+import sys
+import threading
+import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from dataclasses import dataclass, field
 from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Set, Tuple
+_REPO_ROOT = Path(__file__).resolve().parents[1]
+if str(_REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(_REPO_ROOT))
+os.chdir(_REPO_ROOT)
+def _ensure_utf8_stdio() -> None:
+    try:
+        if hasattr(sys.stdout, "reconfigure"):
+            sys.stdout.reconfigure(encoding="utf-8", errors="replace")
+        if hasattr(sys.stderr, "reconfigure"):
+            sys.stderr.reconfigure(encoding="utf-8", errors="replace")
+    except Exception:
+        pass
 EVAL_DATA_PATH = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl"
 EVAL_DATA_PATH_RAW = _REPO_ROOT / "data" / "eval_samples" / "e621_sfw_sample_1000_seed123_buffer10000.jsonl"
     # Stage 2
     retrieved_tags: Set[str] = field(default_factory=set)
     retrieval_recall: float = 0.0
+    # Stage 3 — overall
+    selected_tags: Set[str] = field(default_factory=set)
+    stage3_selected_tags: Set[str] = field(default_factory=set)
+    stage3_selected_scores: Dict[str, float] = field(default_factory=dict)
+    stage3_selected_ranks: Dict[str, int] = field(default_factory=dict)
+    stage3_selected_phrase_ranks: Dict[str, int] = field(default_factory=dict)
+    selection_precision: float = 0.0
+    selection_recall: float = 0.0
+    selection_f1: float = 0.0
     # Stage 3 — character tags only
     gt_character_tags: Set[str] = field(default_factory=set)
     selected_character_tags: Set[str] = field(default_factory=set)
     return precision, recall, f1
+def _process_one_sample(
     sample: Dict[str, Any],
     index: int,
     total: int,
     skip_rewrite: bool,
     allow_nsfw: bool,
+    mode: str,
+    chunk_size: int,
+    per_phrase_k: int,
+    per_phrase_final_k: int,
+    temperature: float,
     max_tokens: int,
     verbose: bool,
     print_lock: threading.Lock,
         # --- Stage 2: Retrieval ---
         t0 = time.time()
+        retrieval_result = psq_candidates_from_rewrite_phrases(
+            rewrite_phrases=result.rewrite_phrases,
+            allow_nsfw_tags=allow_nsfw,
+            per_phrase_final_k=per_phrase_final_k,
+            global_k=300,
+            return_phrase_ranks=True,
+            verbose=False,
+        )
         result.stage2_time = time.time() - t0
+        phrase_rank_by_tag = {}
+        if isinstance(retrieval_result, tuple):
+            if len(retrieval_result) == 2:
+                candidates, phrase_rank_by_tag = retrieval_result
+            else:
+                candidates = retrieval_result[0]
+        else:
+            candidates = retrieval_result
         result.retrieved_tags = {c.tag for c in candidates}
         if gt_tags:
         )
         result.stage3_time = time.time() - t0
+        result.selected_tags = {candidates[idx].tag for idx in picked_indices} if picked_indices else set()
+        result.stage3_selected_tags = set(result.selected_tags)
+        # Build per-tag evidence from Stage 3 selection
+        rank_by_tag = {c.tag: i + 1 for i, c in enumerate(candidates)}
+        for idx in picked_indices:
+            tag = candidates[idx].tag
+            result.stage3_selected_scores[tag] = round(candidates[idx].score_combined, 4)
+            result.stage3_selected_ranks[tag] = rank_by_tag.get(tag, len(candidates) + 1)
+            if phrase_rank_by_tag:
+                result.stage3_selected_phrase_ranks[tag] = phrase_rank_by_tag.get(tag, len(candidates) + 1)
+            result.tag_evidence[tag] = {
+                "source": "stage3",
+                "why": tag_why.get(tag, "unknown"),
+                "retrieval_score": round(candidates[idx].score_combined, 4),
+            }
         # Why distribution
         why_counts: Dict[str, int] = {}
     print(f"  Assets loaded in {time.time() - t0:.1f}s")
+def run_eval(
     n_samples: int = 20,
     caption_field: str = "caption_cogvlm",
     skip_rewrite: bool = False,
     allow_nsfw: bool = False,
     mode: str = "chunked_map_union",
+    chunk_size: int = 60,
+    per_phrase_k: int = 2,
+    per_phrase_final_k: int = 10,
+    temperature: float = 0.0,
     max_tokens: int = 512,
     verbose: bool = False,
     shuffle: bool = True,
     workers: int = 1,
     min_why: Optional[str] = "strong_implied",
     expand_implications: bool = False,
+    infer_structural: bool = False,
+) -> List[SampleResult]:
+    expand_gt = expand_implications
+    if expand_gt:
+        from psq_rag.retrieval.state import expand_tags_via_implications as _expand_gt_tags
+    # Load eval samples — prefer expanded file, fall back to raw
+    eval_path = EVAL_DATA_PATH
     if not eval_path.is_file():
         eval_path = EVAL_DATA_PATH_RAW
         if not eval_path.is_file():
                 using_expanded = True
             else:
                 gt_tags = _flatten_ground_truth_tags(row.get("tags_ground_truth_categorized", ""))
+            if not gt_tags:
+                continue
+            # Remove eval-excluded tags from GT
+            gt_tags -= _EVAL_EXCLUDED_TAGS
+            if expand_gt:
+                gt_tags, _ = _expand_gt_tags(gt_tags)
+                gt_tags -= _EVAL_EXCLUDED_TAGS
+            all_samples.append({
+                "id": row.get("id", row.get("row_id", len(all_samples))),
+                "caption": caption.strip(),
+                "gt_tags": gt_tags,
             })
     if using_expanded:
         print("Using implication-expanded ground truth")
         # Sequential mode (original behavior)
         results: List[SampleResult] = []
         for i, sample in enumerate(samples):
+            result = _process_one_sample(
+                sample, i, total,
+                skip_rewrite, allow_nsfw, mode, chunk_size,
+                per_phrase_k, per_phrase_final_k, temperature, max_tokens, verbose,
+                print_lock, min_why, expand_implications,
+                infer_structural,
+            )
             results.append(result)
     else:
         # Parallel mode
         with ThreadPoolExecutor(max_workers=workers) as executor:
             futures = {
                 executor.submit(
+                    _process_one_sample,
+                    sample, i, total,
+                    skip_rewrite, allow_nsfw, mode, chunk_size,
+                    per_phrase_k, per_phrase_final_k, temperature, max_tokens, verbose,
+                    print_lock, min_why, expand_implications,
+                    infer_structural,
+                ): i
                 for i, sample in enumerate(samples)
             }
             for future in as_completed(futures):
     print("=" * 70)
+def main(argv=None) -> int:
+    _ensure_utf8_stdio()
+    ap = argparse.ArgumentParser(description="End-to-end pipeline evaluation")
     ap.add_argument("--n", type=int, default=20, help="Number of samples to evaluate")
     ap.add_argument("--caption-field", default="caption_cogvlm",
                     choices=["caption_cogvlm", "caption_llm_0", "caption_llm_1",
     ap.add_argument("--allow-nsfw", action="store_true", help="Allow NSFW tags")
     ap.add_argument("--mode", default="chunked_map_union",
                     choices=["single_shot", "chunked_map_union"])
+    ap.add_argument("--chunk-size", type=int, default=60)
+    ap.add_argument("--per-phrase-k", type=int, default=2)
+    ap.add_argument("--per-phrase-final-k", type=int, default=10,
+                    help="Top-K candidates per phrase after scoring (retrieval cap)")
     ap.add_argument("--temperature", type=float, default=0.0)
     ap.add_argument("--max-tokens", type=int, default=512)
     ap.add_argument("--verbose", "-v", action="store_true", help="Show per-call Stage 3 logs")
         caption_field=args.caption_field,
         skip_rewrite=args.skip_rewrite,
         allow_nsfw=args.allow_nsfw,
+        mode=args.mode,
+        chunk_size=args.chunk_size,
+        per_phrase_k=args.per_phrase_k,
+        per_phrase_final_k=args.per_phrase_final_k,
+        temperature=args.temperature,
         max_tokens=args.max_tokens,
         verbose=args.verbose,
         shuffle=args.shuffle,
         "caption_field": args.caption_field,
         "skip_rewrite": args.skip_rewrite,
         "allow_nsfw": args.allow_nsfw,
+        "mode": args.mode,
+        "chunk_size": args.chunk_size,
+        "per_phrase_k": args.per_phrase_k,
+        "per_phrase_final_k": args.per_phrase_final_k,
+        "temperature": args.temperature,
         "shuffle": args.shuffle,
         "seed": args.seed,
         "workers": args.workers,
                 # Diff sets (small — only the errors, not the full lists)
                 "missed": missed_tags,
                 "extra": extra_tags,
+                # Full tag lists (needed for categorized evaluation)
+                "ground_truth_tags": sorted(r.ground_truth_tags),
+                "selected_tags": sorted(r.selected_tags),
+                "stage3_selected": sorted(r.stage3_selected_tags),
+                "stage3_selected_scores": r.stage3_selected_scores,
+                "stage3_selected_ranks": r.stage3_selected_ranks,
+                "stage3_selected_phrase_ranks": r.stage3_selected_phrase_ranks,
+                # Evidence for extra tags (why did these false positives get through?)
+                "extra_evidence": {t: r.tag_evidence.get(t, {}) for t in extra_tags},
+                # Structural tags inferred
+                "structural": r.structural_tags,
                 # Timing
                 "t1": round(r.stage1_time, 2),
                 "t2": round(r.stage2_time, 2),
                 "caption": r.caption,
                 "ground_truth_tags": sorted(r.ground_truth_tags),
                 "rewrite_phrases": r.rewrite_phrases,
+                "retrieved_tags": sorted(r.retrieved_tags),
+                "selected_tags": sorted(r.selected_tags),
+                "stage3_selected": sorted(r.stage3_selected_tags),
+                "stage3_selected_scores": r.stage3_selected_scores,
+                "stage3_selected_ranks": r.stage3_selected_ranks,
+                "stage3_selected_phrase_ranks": r.stage3_selected_phrase_ranks,
+                "implied_tags": sorted(r.implied_tags),
                 "structural_tags": r.structural_tags,
                 "categorized_suggestions": r.categorized_suggestions,
                 "why_counts": r.why_counts,