--- language: - en license: mit tags: - text-to-image - stable-diffusion-xl - lora - art-style - cartoon - comic - black-and-white library_name: diffusers base_model: stabilityai/stable-diffusion-xl-base-1.0 datasets: - camdenbalberg/spy-vs-spy-dataset --- # Spy vs Spy LoRA SDXL LoRA for generating **Spy vs Spy** style black-and-white cartoon art. Trained on comic panels and animation frames from MAD Magazine and MadTV. ## Versions | Version | Images | Base Model | Source | Captioning | |---------|--------|------------|--------|------------| | **v1** | 36 | SDXL 1.0 | MAD Magazine comic panels | Manual descriptions expanded by AI | | **v2** | 220 | SDXL 1.0 | v1 panels + MadTV animation | Voice-guided Claude Vision + hallucination cleanup | | **v3** | 861 | SDXL 1.0 | MadTV animation (DVD rips) | Gemini video scene descriptions | Each version includes all epoch checkpoints (every 2 epochs up to 22, plus final) so you can experiment with different training stages. Best results are typically around epoch 10-16. ## Usage (v1-v3) ### Trigger Word All v1-v3 models use a single trigger word: **`spyvspy`** ### Prompt Format ``` spyvspy, {scene description}, {appearance}, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style ``` ### Example Prompts **Both spies:** ``` spyvspy, white spy planting a bomb under a table while black spy sneaks up behind with a mallet, both wearing fedora hats and trenchcoats with long pointed beak noses and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style ``` **Single spy:** ``` spyvspy, black spy peeking around a corner with a mischievous grin holding a lit stick of dynamite, wearing a fedora hat and trenchcoat with long pointed beak nose and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style ``` **Style only (no specific character):** ``` spyvspy, large explosion cloud with debris and a fedora hat flying through the air, outdoor rooftop setting, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style ``` ### Recommended Settings | Setting | Value | |---------|-------| | Sampler | euler / dpmpp_2m | | Scheduler | normal / karras | | CFG | 7 | | Steps | 25 | | LoRA weight | 0.6-0.9 (start at 0.8) | | Resolution | 1024x1024 or 832x1216 | ### Negative Prompt ``` lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, color, colorful, realistic, photographic, 3d render ``` For single-character prompts, add the other spy to the negative: - White spy only: add `black spy, multiple characters, two characters` - Black spy only: add `white spy, multiple characters, two characters` ## File Structure ``` v1/ spyvspy_sdxl.safetensors # Final checkpoint spyvspy_sdxl-000002.safetensors # Epoch 2 ... spyvspy_sdxl-000022.safetensors # Epoch 22 v2/ spyvspy_sdxl_v2.safetensors # Final checkpoint spyvspy_sdxl_v2-000002.safetensors ... v3/ spyvspy_sdxl_v3.safetensors # Final checkpoint spyvspy_sdxl_v3-000002.safetensors ... ``` ## Training Details | Parameter | Value | |-----------|-------| | Network | LoRA, dim=32, alpha=16 | | Optimizer | AdamW8bit | | UNet LR | 1e-4 | | Text Encoder LR | 5e-5 | | Scheduler | cosine_with_restarts (3 cycles) | | Precision | bf16 | | Resolution | 1024, bucketed | | Epochs | 24, checkpoint every 2 | | Training framework | [kohya_ss sd-scripts](https://github.com/kohya-ss/sd-scripts) | ## Captioning Pipeline Each version used a progressively more sophisticated captioning approach: ### v1 — Manual + AI Expansion Descriptions manually written for each comic panel, then expanded by AI for consistency and detail. ### v2 — Voice-Guided Claude Vision A custom desktop app (`frame_curator.py`) displayed frames for review. When keeping a frame, a voice note was recorded describing the scene. The transcribed voice note was sent alongside the frame to Claude Vision with a [detailed system prompt](https://github.com/camdenbalberg/spy-vs-spy-lora/blob/main/training/spy-vs-spy-v2/spyvspy_caption_prompt.md) instructing it to write structured scene descriptions. A second pass (`fix_captions.py`) cleaned up hallucinated objects by comparing the AI caption against the voice note. ### v3 — Gemini Video Analysis Full episodes sent to Gemini for video-level scene analysis using a [captioning prompt](https://github.com/camdenbalberg/spy-vs-spy-lora/blob/main/training/spy-vs-spy-v3/gemini-prompt.txt). Frames extracted at identified timestamps using FFmpeg with yadif deinterlacing. Each frame reviewed in a custom PySide6 desktop app. Prose captions converted to comma-separated tags via Claude API. ## v4 (Coming Soon) v4 introduces a 5-pass Gemini + Claude merge captioning system, separate character triggers (`white_spy`, `black_spy`), a web-based frame reviewer, and trains on NovaAnimeXL instead of SDXL 1.0. See the [v4 pipeline docs](https://github.com/camdenbalberg/spy-vs-spy-lora/tree/main/training/spy-vs-spy-v4) and [prompt templates](https://github.com/camdenbalberg/spy-vs-spy-lora/tree/main/training/spy-vs-spy-v4/prompts) for details. ## Links - **Training pipeline & code:** [github.com/camdenbalberg/spy-vs-spy-lora](https://github.com/camdenbalberg/spy-vs-spy-lora) - **Training dataset:** [camdenbalberg/spy-vs-spy-dataset](https://huggingface.co/datasets/camdenbalberg/spy-vs-spy-dataset) - **Base model:** [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)