---
language:
- en
license: mit
tags:
- text-to-image
- stable-diffusion-xl
- lora
- art-style
- cartoon
- comic
- black-and-white
library_name: diffusers
base_model: stabilityai/stable-diffusion-xl-base-1.0
datasets:
- camdenbalberg/spy-vs-spy-dataset
---

# Spy vs Spy LoRA

SDXL LoRA for generating **Spy vs Spy** style black-and-white cartoon art. Trained on comic panels and animation frames from MAD Magazine and MadTV.

## Versions

| Version | Images | Base Model | Source | Captioning |
|---------|--------|------------|--------|------------|
| **v1** | 36 | SDXL 1.0 | MAD Magazine comic panels | Manual descriptions expanded by AI |
| **v2** | 220 | SDXL 1.0 | v1 panels + MadTV animation | Voice-guided Claude Vision + hallucination cleanup |
| **v3** | 861 | SDXL 1.0 | MadTV animation (DVD rips) | Gemini video scene descriptions |

Each version includes all epoch checkpoints (every 2 epochs up to 22, plus final) so you can experiment with different training stages. Best results are typically around epoch 10-16.

## Usage (v1-v3)

### Trigger Word

All v1-v3 models use a single trigger word: **`spyvspy`**

### Prompt Format

```
spyvspy, {scene description}, {appearance}, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
```

### Example Prompts

**Both spies:**
```
spyvspy, white spy planting a bomb under a table while black spy sneaks up behind with a mallet, both wearing fedora hats and trenchcoats with long pointed beak noses and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
```

**Single spy:**
```
spyvspy, black spy peeking around a corner with a mischievous grin holding a lit stick of dynamite, wearing a fedora hat and trenchcoat with long pointed beak nose and black sclera eyes, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
```

**Style only (no specific character):**
```
spyvspy, large explosion cloud with debris and a fedora hat flying through the air, outdoor rooftop setting, black and white ink comic art, bold outlines, high contrast, slapstick cartoon style
```

### Recommended Settings

| Setting | Value |
|---------|-------|
| Sampler | euler / dpmpp_2m |
| Scheduler | normal / karras |
| CFG | 7 |
| Steps | 25 |
| LoRA weight | 0.6-0.9 (start at 0.8) |
| Resolution | 1024x1024 or 832x1216 |

### Negative Prompt

```
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, color, colorful, realistic, photographic, 3d render
```

For single-character prompts, add the other spy to the negative:
- White spy only: add `black spy, multiple characters, two characters`
- Black spy only: add `white spy, multiple characters, two characters`

## File Structure

```
v1/
  spyvspy_sdxl.safetensors              # Final checkpoint
  spyvspy_sdxl-000002.safetensors       # Epoch 2
  ...
  spyvspy_sdxl-000022.safetensors       # Epoch 22
v2/
  spyvspy_sdxl_v2.safetensors           # Final checkpoint
  spyvspy_sdxl_v2-000002.safetensors
  ...
v3/
  spyvspy_sdxl_v3.safetensors           # Final checkpoint
  spyvspy_sdxl_v3-000002.safetensors
  ...
```

## Training Details

| Parameter | Value |
|-----------|-------|
| Network | LoRA, dim=32, alpha=16 |
| Optimizer | AdamW8bit |
| UNet LR | 1e-4 |
| Text Encoder LR | 5e-5 |
| Scheduler | cosine_with_restarts (3 cycles) |
| Precision | bf16 |
| Resolution | 1024, bucketed |
| Epochs | 24, checkpoint every 2 |
| Training framework | [kohya_ss sd-scripts](https://github.com/kohya-ss/sd-scripts) |

## Captioning Pipeline

Each version used a progressively more sophisticated captioning approach:

### v1 — Manual + AI Expansion
Descriptions manually written for each comic panel, then expanded by AI for consistency and detail.

### v2 — Voice-Guided Claude Vision
A custom desktop app (`frame_curator.py`) displayed frames for review. When keeping a frame, a voice note was recorded describing the scene. The transcribed voice note was sent alongside the frame to Claude Vision with a [detailed system prompt](https://github.com/camdenbalberg/spy-vs-spy-lora/blob/main/training/spy-vs-spy-v2/spyvspy_caption_prompt.md) instructing it to write structured scene descriptions. A second pass (`fix_captions.py`) cleaned up hallucinated objects by comparing the AI caption against the voice note.

### v3 — Gemini Video Analysis
Full episodes sent to Gemini for video-level scene analysis using a [captioning prompt](https://github.com/camdenbalberg/spy-vs-spy-lora/blob/main/training/spy-vs-spy-v3/gemini-prompt.txt). Frames extracted at identified timestamps using FFmpeg with yadif deinterlacing. Each frame reviewed in a custom PySide6 desktop app. Prose captions converted to comma-separated tags via Claude API.

## v4 (Coming Soon)

v4 introduces a 5-pass Gemini + Claude merge captioning system, separate character triggers (`white_spy`, `black_spy`), a web-based frame reviewer, and trains on NovaAnimeXL instead of SDXL 1.0. See the [v4 pipeline docs](https://github.com/camdenbalberg/spy-vs-spy-lora/tree/main/training/spy-vs-spy-v4) and [prompt templates](https://github.com/camdenbalberg/spy-vs-spy-lora/tree/main/training/spy-vs-spy-v4/prompts) for details.

## Links

- **Training pipeline & code:** [github.com/camdenbalberg/spy-vs-spy-lora](https://github.com/camdenbalberg/spy-vs-spy-lora)
- **Training dataset:** [camdenbalberg/spy-vs-spy-dataset](https://huggingface.co/datasets/camdenbalberg/spy-vs-spy-dataset)
- **Base model:** [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)