Pixa commited on
Commit
8e2062d
·
verified ·
1 Parent(s): 05e96d0

Upload 4 files

Browse files
Files changed (3) hide show
  1. README.md +53 -3
  2. app.py +405 -0
  3. requirements.txt +14 -0
README.md CHANGED
@@ -1,3 +1,53 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LucasArts Pixel Art Style
3
+ emoji: 🎮
4
+ colorFrom: amber
5
+ colorTo: orange
6
+ sdk: gradio
7
+ sdk_version: 5.12.0
8
+ app_file: app.py
9
+ pinned: true
10
+ license: mit
11
+ short_description: Transform face photos into LucasArts adventure-game pixel art!
12
+ disable_embedding: false
13
+ ---
14
+
15
+ # 🎮 LucasArts Pixel Art Style
16
+
17
+ Transform any face photo into LucasArts adventure-game pixel art using AI.
18
+
19
+ ## Architecture
20
+
21
+ - **Base Model:** AlbedoBase XL v2.1 (SDXL)
22
+ - **Face Identity:** InstantID ControlNet + IP-Adapter
23
+ - **Depth Structure:** ZoeDepth ControlNet
24
+ - **Style LoRA:** LucasArts pixel art (`primerz/pixagram → lucasart.safetensors`)
25
+ - **Scheduler:** DPMSolver++ with Karras sigmas
26
+ - **Face Detection:** InsightFace (antelopev2)
27
+
28
+ ## How It Works
29
+
30
+ 1. Upload a clear face photo
31
+ 2. Write a prompt describing the character
32
+ 3. Adjust face/depth/image strength as needed
33
+ 4. Click "Generate LucasArts Style"
34
+
35
+ The pipeline detects your face, extracts identity embeddings, generates a depth map,
36
+ and uses dual ControlNets to produce a pixel-art image that preserves your likeness
37
+ while applying the LucasArts adventure-game aesthetic.
38
+
39
+ ## Key Parameters
40
+
41
+ | Parameter | Default | Effect |
42
+ |-----------|---------|--------|
43
+ | Face Identity Strength | 0.85 | Higher = more likeness, less style freedom |
44
+ | Image Strength | 0.15 | Higher = closer to original photo |
45
+ | Depth Strength | 0.80 | Higher = more structural preservation |
46
+ | Guidance Scale | 7.0 | Higher = stronger prompt adherence |
47
+ | Steps | 20 | More = higher quality, slower |
48
+
49
+ ## Credits
50
+
51
+ - Inspired by fofr's [face-to-many](https://github.com/fofr/cog-face-to-many)
52
+ - InstantID by [InstantX](https://huggingface.co/InstantX/InstantID)
53
+ - LucasArts LoRA from [primerz/pixagram](https://huggingface.co/primerz/pixagram)
app.py ADDED
@@ -0,0 +1,405 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LucasArts Pixel Art Style — Hugging Face Space
3
+
4
+ Transform face photos into LucasArts adventure-game pixel art using:
5
+ - SDXL base (AlbedoBase XL v2.1)
6
+ - InstantID ControlNet for face identity preservation
7
+ - ZoeDepth ControlNet for structural preservation
8
+ - LucasArts LoRA (primerz/pixagram → lucasart.safetensors)
9
+ - DPMSolver++ scheduler (traditional SDXL, not LCM)
10
+
11
+ Architecture inspired by fofr's face-to-many.
12
+ """
13
+
14
+ import spaces
15
+ import gradio as gr
16
+ import torch
17
+ import time
18
+ import cv2
19
+ import numpy as np
20
+ from PIL import Image
21
+
22
+ torch.jit.script = lambda f: f # Disable JIT for compatibility
23
+
24
+ from huggingface_hub import hf_hub_download, snapshot_download
25
+ from diffusers.models import ControlNetModel
26
+ from diffusers import AutoencoderKL, DPMSolverMultistepScheduler
27
+ from controlnet_aux import ZoeDetector
28
+ from insightface.app import FaceAnalysis
29
+
30
+ from pipeline_stable_diffusion_xl_instantid_img2img import (
31
+ StableDiffusionXLInstantIDImg2ImgPipeline,
32
+ draw_kps,
33
+ )
34
+
35
+ # ============================================================
36
+ # CONFIGURATION
37
+ # ============================================================
38
+
39
+ TITLE = "LucasArts Pixel Art Style"
40
+ DESCRIPTION = """Transform any face photo into LucasArts adventure-game pixel art.
41
+ Uses InstantID for face identity + ZoeDepth for structure + LucasArts LoRA style."""
42
+
43
+ # Model repos
44
+ BASE_MODEL_REPO = "frankjoshua/albedobaseXL_v21"
45
+ VAE_REPO = "madebyollin/sdxl-vae-fp16-fix"
46
+ INSTANTID_REPO = "InstantX/InstantID"
47
+ ZOEDEPTH_CN_REPO = "diffusers/controlnet-zoe-depth-sdxl-1.0"
48
+ ANNOTATOR_REPO = "lllyasviel/Annotators"
49
+ ANTELOPE_REPO = "DIAMONIK7777/antelopev2"
50
+
51
+ # LucasArts LoRA
52
+ LORA_REPO = "primerz/pixagram"
53
+ LORA_FILENAME = "lucasart.safetensors"
54
+ LORA_STRENGTH = 0.9
55
+ TRIGGER_WORD = "lucasarts style"
56
+
57
+ # Generation defaults
58
+ DEFAULT_PROMPT = "a person"
59
+ DEFAULT_NEGATIVE = (
60
+ "ugly, artifacts, blurry, deformed, disfigured, low quality, "
61
+ "watermark, text, photo-realistic, photography, realistic"
62
+ )
63
+ DEFAULT_GUIDANCE_SCALE = 7.0
64
+ DEFAULT_STEPS = 20
65
+ DEFAULT_FACE_STRENGTH = 0.85
66
+ DEFAULT_IMAGE_STRENGTH = 0.15
67
+ DEFAULT_DEPTH_STRENGTH = 0.8
68
+
69
+ DEVICE = "cuda"
70
+ DTYPE = torch.float16
71
+
72
+ # ============================================================
73
+ # MODEL LOADING (runs once at startup)
74
+ # ============================================================
75
+
76
+ print("=" * 60)
77
+ print("Loading LucasArts Pixel Art Space")
78
+ print("=" * 60)
79
+
80
+ # 1. InsightFace — face detection & embedding
81
+ print("\n[1/6] Loading InsightFace (antelopev2)...")
82
+ st = time.time()
83
+ snapshot_download(repo_id=ANTELOPE_REPO, local_dir="/data/models/antelopev2")
84
+ face_app = FaceAnalysis(
85
+ name="antelopev2",
86
+ root="/data",
87
+ providers=["CPUExecutionProvider"],
88
+ )
89
+ face_app.prepare(ctx_id=0, det_size=(640, 640))
90
+ print(f" [OK] InsightFace loaded ({time.time() - st:.1f}s)")
91
+
92
+ # 2. InstantID ControlNet
93
+ print("\n[2/6] Loading InstantID ControlNet...")
94
+ st = time.time()
95
+ hf_hub_download(
96
+ repo_id=INSTANTID_REPO,
97
+ filename="ControlNetModel/config.json",
98
+ local_dir="/data/checkpoints",
99
+ )
100
+ hf_hub_download(
101
+ repo_id=INSTANTID_REPO,
102
+ filename="ControlNetModel/diffusion_pytorch_model.safetensors",
103
+ local_dir="/data/checkpoints",
104
+ )
105
+ hf_hub_download(
106
+ repo_id=INSTANTID_REPO,
107
+ filename="ip-adapter.bin",
108
+ local_dir="/data/checkpoints",
109
+ )
110
+ identitynet = ControlNetModel.from_pretrained(
111
+ "/data/checkpoints/ControlNetModel", torch_dtype=DTYPE
112
+ )
113
+ print(f" [OK] InstantID ControlNet loaded ({time.time() - st:.1f}s)")
114
+
115
+ # 3. ZoeDepth ControlNet
116
+ print("\n[3/6] Loading ZoeDepth ControlNet...")
117
+ st = time.time()
118
+ zoedepthnet = ControlNetModel.from_pretrained(
119
+ ZOEDEPTH_CN_REPO, torch_dtype=DTYPE
120
+ )
121
+ print(f" [OK] ZoeDepth ControlNet loaded ({time.time() - st:.1f}s)")
122
+
123
+ # 4. SDXL Pipeline with dual ControlNet
124
+ print("\n[4/6] Loading SDXL Pipeline...")
125
+ st = time.time()
126
+ vae = AutoencoderKL.from_pretrained(VAE_REPO, torch_dtype=DTYPE)
127
+ pipe = StableDiffusionXLInstantIDImg2ImgPipeline.from_pretrained(
128
+ BASE_MODEL_REPO,
129
+ vae=vae,
130
+ controlnet=[identitynet, zoedepthnet],
131
+ torch_dtype=DTYPE,
132
+ )
133
+ pipe.scheduler = DPMSolverMultistepScheduler.from_config(
134
+ pipe.scheduler.config, use_karras_sigmas=True
135
+ )
136
+ pipe.load_ip_adapter_instantid("/data/checkpoints/ip-adapter.bin")
137
+ pipe.set_ip_adapter_scale(0.8)
138
+ print(f" [OK] Pipeline loaded ({time.time() - st:.1f}s)")
139
+
140
+ # 5. Load and fuse LucasArts LoRA
141
+ print("\n[5/6] Loading LucasArts LoRA...")
142
+ st = time.time()
143
+ pipe.load_lora_weights(LORA_REPO, weight_name=LORA_FILENAME)
144
+ pipe.fuse_lora(LORA_STRENGTH)
145
+ print(f" [OK] LoRA fused at strength {LORA_STRENGTH} ({time.time() - st:.1f}s)")
146
+
147
+ # 6. ZoeDetector for depth maps
148
+ print("\n[6/6] Loading ZoeDetector...")
149
+ st = time.time()
150
+ zoe = ZoeDetector.from_pretrained(ANNOTATOR_REPO)
151
+ zoe.to(DEVICE)
152
+ print(f" [OK] ZoeDetector loaded ({time.time() - st:.1f}s)")
153
+
154
+ # Move pipeline to GPU
155
+ pipe.to(DEVICE)
156
+
157
+ print("\n" + "=" * 60)
158
+ print("All models loaded — ready to generate!")
159
+ print("=" * 60 + "\n")
160
+
161
+
162
+ # ============================================================
163
+ # HELPERS
164
+ # ============================================================
165
+
166
+ def center_crop_square(img: Image.Image) -> Image.Image:
167
+ """Center-crop an image to a square."""
168
+ square_size = min(img.size)
169
+ left = (img.width - square_size) / 2
170
+ top = (img.height - square_size) / 2
171
+ right = (img.width + square_size) / 2
172
+ bottom = (img.height + square_size) / 2
173
+ return img.crop((left, top, right, bottom))
174
+
175
+
176
+ def extract_face(image: Image.Image):
177
+ """
178
+ Detect face with InsightFace, return (embedding, keypoints_image).
179
+ Raises gr.Error if no face found.
180
+ """
181
+ bgr = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
182
+ faces = face_app.get(bgr)
183
+ if not faces:
184
+ raise gr.Error(
185
+ "No face detected in your image. Please upload a clear face photo."
186
+ )
187
+ # Use the largest face
188
+ face_info = sorted(
189
+ faces,
190
+ key=lambda x: (x["bbox"][2] - x["bbox"][0]) * (x["bbox"][3] - x["bbox"][1]),
191
+ )[-1]
192
+ face_emb = face_info["embedding"]
193
+ face_kps = draw_kps(image, face_info["kps"])
194
+ return face_emb, face_kps
195
+
196
+
197
+ # ============================================================
198
+ # GENERATION
199
+ # ============================================================
200
+
201
+ @spaces.GPU(duration=90)
202
+ def generate(
203
+ face_image: Image.Image,
204
+ prompt: str,
205
+ negative_prompt: str,
206
+ face_strength: float,
207
+ image_strength: float,
208
+ depth_strength: float,
209
+ guidance_scale: float,
210
+ num_steps: int,
211
+ ) -> tuple:
212
+ """Generate LucasArts-style pixel art from a face photo."""
213
+
214
+ if face_image is None:
215
+ gr.Warning("Please upload a face photo first!")
216
+ return None, "No image provided"
217
+
218
+ try:
219
+ # Prepare image (square crop, 1024x1024)
220
+ face_image = center_crop_square(face_image)
221
+ face_image = face_image.resize((1024, 1024), Image.LANCZOS)
222
+
223
+ # Extract face embedding + keypoints
224
+ face_emb, face_kps = extract_face(face_image)
225
+
226
+ # Generate depth map
227
+ with torch.no_grad():
228
+ depth_image = zoe(face_image)
229
+
230
+ # Dual control images: [InstantID keypoints, ZoeDepth]
231
+ w, h = face_kps.size
232
+ control_images = [face_kps, depth_image.resize((w, h))]
233
+
234
+ # Build prompt with trigger word
235
+ full_prompt = f"{TRIGGER_WORD}, {prompt}" if prompt else TRIGGER_WORD
236
+ neg = negative_prompt if negative_prompt else None
237
+
238
+ # Generate
239
+ result = pipe(
240
+ prompt=full_prompt,
241
+ negative_prompt=neg,
242
+ image_embeds=face_emb,
243
+ image=face_image,
244
+ control_image=control_images,
245
+ strength=1.0 - image_strength,
246
+ num_inference_steps=num_steps,
247
+ guidance_scale=guidance_scale,
248
+ controlnet_conditioning_scale=[face_strength, depth_strength],
249
+ width=1024,
250
+ height=1024,
251
+ ).images[0]
252
+
253
+ info = (
254
+ f"Prompt: {full_prompt}\n"
255
+ f"Steps: {num_steps} | Guidance: {guidance_scale}\n"
256
+ f"Face: {face_strength} | Image: {image_strength} | Depth: {depth_strength}"
257
+ )
258
+
259
+ return result, info
260
+
261
+ except gr.Error:
262
+ raise
263
+ except Exception as e:
264
+ gr.Error(f"Generation failed: {str(e)}")
265
+ return None, f"Error: {str(e)}"
266
+
267
+
268
+ # ============================================================
269
+ # GRADIO UI
270
+ # ============================================================
271
+
272
+ with gr.Blocks(
273
+ title=TITLE,
274
+ theme=gr.themes.Soft(primary_hue="amber", secondary_hue="orange"),
275
+ ) as demo:
276
+
277
+ gr.Markdown(f"# 🎮 {TITLE}")
278
+ gr.Markdown(DESCRIPTION)
279
+
280
+ with gr.Row():
281
+ with gr.Column(scale=1):
282
+ input_image = gr.Image(
283
+ label="📷 Upload a face photo",
284
+ type="pil",
285
+ height=400,
286
+ )
287
+
288
+ prompt = gr.Textbox(
289
+ label="✨ Prompt",
290
+ value=DEFAULT_PROMPT,
291
+ placeholder="Describe the subject (e.g., a pirate captain, a wizard)...",
292
+ lines=2,
293
+ )
294
+
295
+ generate_btn = gr.Button(
296
+ "🎮 Generate LucasArts Style",
297
+ variant="primary",
298
+ size="lg",
299
+ )
300
+
301
+ with gr.Accordion("⚙️ Advanced Settings", open=False):
302
+ negative_prompt = gr.Textbox(
303
+ label="Negative Prompt",
304
+ value=DEFAULT_NEGATIVE,
305
+ lines=2,
306
+ )
307
+
308
+ face_strength = gr.Slider(
309
+ label="Face Identity Strength",
310
+ minimum=0.0,
311
+ maximum=2.0,
312
+ value=DEFAULT_FACE_STRENGTH,
313
+ step=0.01,
314
+ info="Higher = more face likeness, less creative freedom",
315
+ )
316
+
317
+ image_strength = gr.Slider(
318
+ label="Image Strength",
319
+ minimum=0.0,
320
+ maximum=1.0,
321
+ value=DEFAULT_IMAGE_STRENGTH,
322
+ step=0.01,
323
+ info="Higher = more similarity to original photo structure/colors",
324
+ )
325
+
326
+ depth_strength = gr.Slider(
327
+ label="Depth ControlNet Strength",
328
+ minimum=0.0,
329
+ maximum=1.0,
330
+ value=DEFAULT_DEPTH_STRENGTH,
331
+ step=0.01,
332
+ info="Higher = more structural preservation from depth map",
333
+ )
334
+
335
+ guidance_scale = gr.Slider(
336
+ label="Guidance Scale",
337
+ minimum=1.0,
338
+ maximum=15.0,
339
+ value=DEFAULT_GUIDANCE_SCALE,
340
+ step=0.1,
341
+ info="Higher = stronger prompt adherence",
342
+ )
343
+
344
+ num_steps = gr.Slider(
345
+ label="Inference Steps",
346
+ minimum=10,
347
+ maximum=50,
348
+ value=DEFAULT_STEPS,
349
+ step=1,
350
+ info="More steps = higher quality but slower",
351
+ )
352
+
353
+ with gr.Column(scale=1):
354
+ output_image = gr.Image(
355
+ label="🖼️ LucasArts Style Result",
356
+ type="pil",
357
+ height=400,
358
+ )
359
+
360
+ gen_info = gr.Textbox(
361
+ label="📋 Generation Info",
362
+ lines=4,
363
+ interactive=False,
364
+ )
365
+
366
+ gr.Markdown("### 💡 Prompt Ideas")
367
+ gr.Examples(
368
+ examples=[
369
+ ["a pirate captain"],
370
+ ["a wizard in a dark tower"],
371
+ ["a detective in a noir city"],
372
+ ["a space adventurer"],
373
+ ["a medieval knight"],
374
+ ],
375
+ inputs=[prompt],
376
+ label="Click to use",
377
+ )
378
+
379
+ gr.Markdown(
380
+ "---\n"
381
+ "**Architecture:** SDXL + InstantID + ZoeDepth ControlNet + LucasArts LoRA \n"
382
+ "**Scheduler:** DPMSolver++ (Karras) \n"
383
+ "**Inspired by:** fofr's [face-to-many](https://github.com/fofr/cog-face-to-many)"
384
+ )
385
+
386
+ # Wire up
387
+ generate_btn.click(
388
+ fn=generate,
389
+ inputs=[
390
+ input_image,
391
+ prompt,
392
+ negative_prompt,
393
+ face_strength,
394
+ image_strength,
395
+ depth_strength,
396
+ guidance_scale,
397
+ num_steps,
398
+ ],
399
+ outputs=[output_image, gen_info],
400
+ )
401
+
402
+
403
+ if __name__ == "__main__":
404
+ demo.queue()
405
+ demo.launch(share=True)
requirements.txt ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diffusers>=0.25.0
2
+ transformers
3
+ accelerate
4
+ safetensors
5
+ torch
6
+ torchvision
7
+ controlnet_aux
8
+ insightface
9
+ onnxruntime
10
+ huggingface_hub
11
+ gradio>=4.0.0
12
+ opencv-python-headless
13
+ numpy
14
+ Pillow