# Quick Start for New Sessions ## What is Prompt Squirrel? A RAG system that converts natural language prompts → e621-style tags for furry art generation. ## Three-Stage Pipeline 1. **Stage 1 (Rewrite)**: Natural language → tag-shaped phrases (LLM) 2. **Stage 2 (Retrieval)**: Phrases → candidate tags (FastText + TF-IDF/SVD, closed vocab) 3. **Stage 3 (Selection)**: Candidates → final selected tags (LLM) 4. **Stage 3s (Structural)**: Selected tags → structural inferences (optional, e.g., clothing → topless) ## Latest Features (Feb 13-14, 2026) - **Tag Categorization**: Organized suggestions by e621 checklist categories (species, clothing, posture, etc.) - **Category Parser**: Parses checklist with tiers (CRITICAL/IMPORTANT/NICE_TO_HAVE/META) and constraints - **Evaluation Metrics**: Per-category P/R/F1, ranking metrics (MRR, P@K, nDCG) - **Multi-select Constraints**: Fixed body_type, species, gender to allow multiple tags ## Key Files - `app.py` - Gradio web interface - `psq_rag/tagging/categorized_suggestions.py` - Category-based tag suggestions - `psq_rag/tagging/category_parser.py` - Parse e621 checklist - `scripts/eval_pipeline.py` - Main evaluation harness - `scripts/eval_categorized.py` - Per-category metrics - `scripts/analyze_threshold_grid.py` - Threshold grid analysis (score/global rank/phrase rank) - `scripts/analyze_caption_evident_audit.py` - Caption-evident audit vs retrieval - `docs/retrieval_contract.md` - Stage 2 spec - `docs/stage3_contract.md` - Stage 3 spec - `tagging_checklist.txt` - E621 tagging guidelines ## Running Code ```bash # Always from repo root .venv/Scripts/python.exe -m pip install -r requirements.txt # Windows .venv/Scripts/python.exe app.py ``` ## Recent Git History (Last 5 commits) ``` 0f73a4b - Fix eval_categorized.py to work with eval_pipeline.py output ff407fc - Remove binary PNG files (use Hugging Face XET storage instead) 8ba971a - Add eval results for debugging 51b7109 - Add ranking metrics infrastructure to eval pipeline edba146 - Add per-category evaluation metrics script ``` ## Key Contracts to Remember 1. **Stage boundaries are strict**: Don't mix retrieval (Stage 2) with selection (Stage 3) 2. **Keep diffs small**: One focused change per commit 3. **Code matches contracts**: Update code to match docs, not vice versa 4. **No feature flags**: Delete old code paths, no legacy behavior switches ## Quick Orientation Commands ```bash # View project structure ls -la # View recent commits git log --oneline -10 # Check current branch git branch # List Python modules ls -la psq_rag/ # View evaluation results ls -la data/eval_results/ ``` ## Common Tasks - **Add category**: Edit `tagging_checklist.txt`, update parser - **Eval changes**: Run `scripts/eval_pipeline.py`, then `scripts/eval_categorized.py` - **Threshold sweeps**: Run `scripts/analyze_threshold_grid.py` (see `--mode score|rank|phrase_rank`) - **Caption-evident audit**: Run `scripts/analyze_caption_evident_audit.py` - **Test retrieval**: Use `scripts/smoke_test.py` - **Debug Stage 3**: Use `scripts/stage3_debug.py` (`--phrases` optional; omitted runs Stage 1 rewrite first, then Stage 2 retrieval from rewritten phrases) ## Data Artifacts (Lazy-loaded) - FastText embeddings (semantic similarity) - TF-IDF + SVD matrices (context similarity) - Alias → canonical tag mappings - Tag counts, implications, groups, wiki definitions ## Eval Datasets - `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_expanded.jsonl` - Base eval set (implication-expanded GT) - `data/eval_samples/e621_sfw_sample_1000_seed123_buffer10000_caption_evident.jsonl` - Caption-evident GT subset (10 samples); used to estimate retrieval ceiling from text ## New Eval Features (Feb 2026) - `eval_pipeline.py` now logs Stage 3 selection scores and ranks: - `stage3_selected_scores` (retrieval score) - `stage3_selected_ranks` (global rank) - `stage3_selected_phrase_ranks` (per-phrase rank) - New CLI flag: `--per-phrase-final-k` to control per-phrase retrieval cap ## NSFW Handling - Filtered via `word_rating_probabilities.csv` (threshold 0.95) - Stage 2 removes NSFW tags when `allow_nsfw_tags=False` - Stage 3 doesn't need policy flags (defense-in-depth only)