Text-to-Image
Diffusers
Safetensors
GGUF
ZImagePipeline
image-to-image
inpainting
controlnet
z-image-turbo
Instructions to use elismasilva/z-image-control-turbo-unified-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use elismasilva/z-image-control-turbo-unified-v2 with Diffusers:
pip install -U diffusers transformers accelerate
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline controlnet = ControlNetModel.from_pretrained("elismasilva/z-image-control-turbo-unified-v2") pipe = StableDiffusionControlNetPipeline.from_pretrained( "fill-in-base-model", controlnet=controlnet ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Commit ·
62561bb
1
Parent(s): 6dc45e0
add differential diffusion inpaint
Browse files- .gitignore +2 -1
- README.md +66 -22
- assets/inpaint_mask.jpg +3 -0
- assets/mask_1.jpg +3 -0
- assets/{mask_inpaint.jpg → mask_2.jpg} +0 -0
- diffusers_local/pipeline_z_image_control_unified.py +100 -35
- infer_inpaint.py +9 -10
- prepare_mask.py +101 -0
- results/new_tests/{result_inpaint.png → result_inpaint_2.png} +0 -0
- results/new_tests/result_inpaint_default.png +3 -0
- results/new_tests/result_inpaint_diff.png +3 -0
- results/new_tests/result_inpaint_diffinpaint.png +3 -0
.gitignore
CHANGED
|
@@ -17,4 +17,5 @@ bk/
|
|
| 17 |
outputs/
|
| 18 |
original/
|
| 19 |
Makefile
|
| 20 |
-
pyproject.toml
|
|
|
|
|
|
| 17 |
outputs/
|
| 18 |
original/
|
| 19 |
Makefile
|
| 20 |
+
pyproject.toml
|
| 21 |
+
README_.md
|
README.md
CHANGED
|
@@ -41,29 +41,54 @@ pip install -r requirements.txt
|
|
| 41 |
|
| 42 |
## 🚀 Usage
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
* `infer_controlnet.py`: Script for running controlnet inference.
|
| 49 |
-
* `infer_inpaint.py`: Script for running inpaint inference.
|
| 50 |
-
* `infer_t2i.py`: Script for running text-to-image inference.
|
| 51 |
-
* `infer_i2i.py`: Script for running image-to-image inference.
|
| 52 |
-
* `diffusers_local/`: Custom pipeline code (`ZImageControlUnifiedPipeline`) and transformer logic.
|
| 53 |
-
* `requirements.txt`: Python dependencies.
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
-
|
| 58 |
-
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized **GGUF** file (`z_image_turbo_control_unified_v2.1_q4_k_m.gguf`). Simply configure the `infer_controlnet.py` script to point to the GGUF file.
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
###
|
| 65 |
-
|
| 66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
## 🛠️ Model Features & Configuration (V2)
|
| 69 |
|
|
@@ -76,7 +101,7 @@ Use this version if you have ample VRAM (e.g., 24GB+). Configure `infer_controln
|
|
| 76 |
|
| 77 |
This optmized V2 model introduces several new features and parameters for enhanced control and flexibility:
|
| 78 |
|
| 79 |
-
* **Unified Pipeline:** A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and Inpainting
|
| 80 |
* **Refiner Scale (`controlnet_refiner_conditioning_scale`):** It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force.
|
| 81 |
* **Optional Refiner (`add_control_noise_refiner=False`):** You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results.
|
| 82 |
* **Inpainting Blur (`mask_blur_radius`):** A parameter to soften the edges of the inpainting mask for smoother transitions.
|
|
@@ -98,12 +123,18 @@ The new `controlnet_refiner_conditioning_scale` parameter allows for fine-tuning
|
|
| 98 |
|
| 99 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 100 |
<tr>
|
| 101 |
-
<td>Pose + Inpaint</td>
|
| 102 |
-
<td>
|
|
|
|
|
|
|
|
|
|
| 103 |
</tr>
|
| 104 |
<tr>
|
| 105 |
-
<td><img src="assets/
|
| 106 |
-
<td><img src="
|
|
|
|
|
|
|
|
|
|
| 107 |
</tr>
|
| 108 |
</table>
|
| 109 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
@@ -188,3 +219,16 @@ The table below shows the generation results under different combinations of Dif
|
|
| 188 |
| **20** |  |  |  |  |  |  |
|
| 189 |
| **30** |  |  |  |  |  |  |
|
| 190 |
| **40** |  |  |  |  |  |  |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
## 🚀 Usage
|
| 43 |
|
| 44 |
+
This repository provides separate, easy-to-use scripts for each generation task.
|
| 45 |
+
|
| 46 |
+
### High-Level Scripts
|
| 47 |
+
* `infer_t2i.py`: For Text-to-Image generation.
|
| 48 |
+
* `infer_i2i.py`: For Image-to-Image generation.
|
| 49 |
+
* `infer_controlnet.py`: For ControlNet-guided generation (Pose, Canny, Depth, etc.).
|
| 50 |
+
* `infer_inpaint.py`: For all inpainting tasks.
|
| 51 |
+
|
| 52 |
+
### Hardware Options
|
| 53 |
+
|
| 54 |
+
#### Option 1: Low VRAM (GGUF) - Recommended
|
| 55 |
+
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized **GGUF** file. To use it, set `use_gguf = True` in the desired inference script and provide the path to the `.gguf` file.
|
| 56 |
+
|
| 57 |
+
**Key Features:**
|
| 58 |
+
* Loads the unified transformer from a single 4-bit or 8-bit quantized file.
|
| 59 |
+
* Enables aggressive `group_offload` to fit large models on consumer GPUs.
|
| 60 |
|
| 61 |
+
#### Option 2: High Precision (Diffusers/BF16)
|
| 62 |
+
Use this version if you have ample VRAM (e.g., 24GB+). Set `use_gguf = False` in the script to load the model using the standard `from_pretrained` directory structure for full **BFloat16** precision.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
+
## 🎨 Inpainting Guide
|
| 65 |
|
| 66 |
+
The `infer_inpaint.py` script leverages a powerful, unified inpainting system with multiple modes controlled by the `inpaint_mode` parameter.
|
|
|
|
| 67 |
|
| 68 |
+
### Preparing Your Mask
|
| 69 |
+
For best results, especially when removing objects or dealing with complex edges, it's recommended to pre-process your mask. We provide a utility script for this.
|
| 70 |
+
|
| 71 |
+
**`prepare_mask.py`**
|
| 72 |
+
This script expands the white areas of your mask and applies a feather (blur) to the edges. This helps to completely cover artifacts from the old image and ensures a smooth, seamless blend with the new generated content.
|
| 73 |
+
|
| 74 |
+
**Usage:**
|
| 75 |
+
```bash
|
| 76 |
+
python prepare_mask.py <input_mask_path> <output_mask_path> --expand 15 --feather 10
|
| 77 |
+
```
|
| 78 |
+
* `--expand`: Expands the mask to cover "ghosting".
|
| 79 |
+
* `--feather`: Creates a soft gradient for seamless blending.
|
| 80 |
|
| 81 |
+
### Inpainting Modes in `infer_inpaint.py`
|
| 82 |
+
You can choose the inpainting method by setting the `inpaint_mode` variable in the script:
|
| 83 |
|
| 84 |
+
1. **`inpaint_mode = "default"`**
|
| 85 |
+
* Uses the standard ControlNet-based inpainting. Good for general-purpose tasks.
|
| 86 |
+
|
| 87 |
+
2. **`inpaint_mode = "diff"`**
|
| 88 |
+
* Uses the "Differential Diffusion" inpainting technique. This method is excellent for preserving the original background texture and lighting perfectly while generating new content in the masked area. It works by composing latents at each step of the diffusion process.
|
| 89 |
+
|
| 90 |
+
3. **`inpaint_mode = "diff+inpaint"`**
|
| 91 |
+
* Combines both methods. It uses the `diff` mode for background preservation while also feeding the inpainting context to the ControlNet layers. This can be useful for complex scenes where both structural guidance and texture preservation are needed.
|
| 92 |
|
| 93 |
## 🛠️ Model Features & Configuration (V2)
|
| 94 |
|
|
|
|
| 101 |
|
| 102 |
This optmized V2 model introduces several new features and parameters for enhanced control and flexibility:
|
| 103 |
|
| 104 |
+
* **Unified Pipeline:** A single pipeline now handles Text-to-Image, Image-to-Image, ControlNet, and and multiple Inpainting modes.
|
| 105 |
* **Refiner Scale (`controlnet_refiner_conditioning_scale`):** It provides fine-grained control over the influence of the initial refining layers, allowing for isolated adjustments without the influence of the controlnet's conditioning force.
|
| 106 |
* **Optional Refiner (`add_control_noise_refiner=False`):** You can now disable the control noise refiner layers when loading the model to save memory or for different stylistic results.
|
| 107 |
* **Inpainting Blur (`mask_blur_radius`):** A parameter to soften the edges of the inpainting mask for smoother transitions.
|
|
|
|
| 123 |
|
| 124 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
| 125 |
<tr>
|
| 126 |
+
<td>Pose + Inpaint Image</td>
|
| 127 |
+
<td>Inpaint Mask</td>
|
| 128 |
+
<td>Model Inpaint</td>
|
| 129 |
+
<td>Diff Inpaint</td>
|
| 130 |
+
<td>Diff + Model Inpaint</td>
|
| 131 |
</tr>
|
| 132 |
<tr>
|
| 133 |
+
<td><img src="assets/pose.jpg" width="100%" /><img src="assets/inpaint.jpg" width="100%" /></td>
|
| 134 |
+
<td><img src="assets/inpaint_mask.jpg" width="100%" /></td>
|
| 135 |
+
<td><img src="results/new_tests/result_inpaint_default.png" width="100%" /></td>
|
| 136 |
+
<td><img src="results/new_tests/result_inpaint_diff.png" width="100%" /></td>
|
| 137 |
+
<td><img src="results/new_tests/result_inpaint_diffinpaint.png" width="100%" /></td>
|
| 138 |
</tr>
|
| 139 |
</table>
|
| 140 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
|
|
|
| 219 |
| **20** |  |  |  |  |  |  |
|
| 220 |
| **30** |  |  |  |  |  |  |
|
| 221 |
| **40** |  |  |  |  |  |  |
|
| 222 |
+
|
| 223 |
+
---
|
| 224 |
+
|
| 225 |
+
## 📂 Repository Structure
|
| 226 |
+
|
| 227 |
+
* `./transformer/`: Directory for model weights (GGUF or standard).
|
| 228 |
+
* `infer_controlnet.py`: Script for ControlNet inference.
|
| 229 |
+
* `infer_inpaint.py`: Script for inpainting inference.
|
| 230 |
+
* `infer_t2i.py`: Script for Text-to-Image inference.
|
| 231 |
+
* `infer_i2i.py`: Script for Image-to-Image inference.
|
| 232 |
+
* `prepare_mask.py`: Utility script to process masks for inpainting.
|
| 233 |
+
* `diffusers_local/`: Custom pipeline code.
|
| 234 |
+
* `requirements.txt`: Python dependencies.
|
assets/inpaint_mask.jpg
ADDED
|
Git LFS Details
|
assets/mask_1.jpg
ADDED
|
Git LFS Details
|
assets/{mask_inpaint.jpg → mask_2.jpg}
RENAMED
|
File without changes
|
diffusers_local/pipeline_z_image_control_unified.py
CHANGED
|
@@ -15,11 +15,12 @@
|
|
| 15 |
|
| 16 |
|
| 17 |
import inspect
|
| 18 |
-
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
|
| 19 |
|
| 20 |
import numpy as np
|
| 21 |
import torch
|
| 22 |
import torch.nn.functional as F
|
|
|
|
| 23 |
from diffusers import AutoencoderKL, DiffusionPipeline, FlowMatchEulerDiscreteScheduler
|
| 24 |
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
|
| 25 |
from diffusers.loaders import FromSingleFileMixin, ZImageLoraLoaderMixin
|
|
@@ -467,6 +468,8 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 467 |
reference_latents_shape: Tuple,
|
| 468 |
device: torch.device,
|
| 469 |
dtype: torch.dtype,
|
|
|
|
|
|
|
| 470 |
) -> torch.Tensor:
|
| 471 |
"""
|
| 472 |
Processes a MASK using the mask_processor, inverts it, resizes it, and formats it for the control_context.
|
|
@@ -494,13 +497,18 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 494 |
)
|
| 495 |
return torch.zeros(placeholder_shape, device=device, dtype=dtype)
|
| 496 |
|
| 497 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 498 |
|
| 499 |
-
|
| 500 |
-
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
|
|
|
| 504 |
|
| 505 |
def prepare_control_latents(
|
| 506 |
self, image: PipelineImageInput, width: int, height: int, batch_size: int, num_images_per_prompt: int, device: torch.device, dtype: torch.dtype
|
|
@@ -595,7 +603,8 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 595 |
prompt: Union[str, List[str]],
|
| 596 |
image: Optional[PipelineImageInput] = None,
|
| 597 |
mask_image: Optional[PipelineImageInput] = None,
|
| 598 |
-
|
|
|
|
| 599 |
control_image: Optional[PipelineImageInput] = None,
|
| 600 |
height: Optional[int] = None,
|
| 601 |
width: Optional[int] = None,
|
|
@@ -630,7 +639,10 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 630 |
The initial image for image-to-image or inpainting modes.
|
| 631 |
mask_image (`PipelineImageInput`, *optional*):
|
| 632 |
The mask image for inpainting. White areas are preserved, black areas are inpainted.
|
| 633 |
-
|
|
|
|
|
|
|
|
|
|
| 634 |
The radius for blurring the edges of the inpainting mask to create a smoother transition.
|
| 635 |
control_image (`PipelineImageInput`, *optional*):
|
| 636 |
The conditioning image for control modes (e.g., Canny, depth).
|
|
@@ -640,21 +652,21 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 640 |
The width in pixels of the generated image.
|
| 641 |
num_inference_steps (`int`, *optional*, defaults to 20):
|
| 642 |
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
|
| 643 |
-
|
| 644 |
sigmas (`List[float]`, *optional*):
|
| 645 |
Custom sigmas to use for the denoising process. If not defined, the scheduler's default behavior
|
| 646 |
-
|
| 647 |
strength (`float`, *optional*, defaults to 1.0):
|
| 648 |
Denoising strength for image-to-image. A value of 1.0 means the initial image is fully replaced,
|
| 649 |
-
|
| 650 |
guidance_scale (`float`, *optional*, defaults to 4.0):
|
| 651 |
The scale for classifier-free guidance. A value > 1 enables it. Higher values encourage images
|
| 652 |
-
|
| 653 |
cfg_normalization (`bool`, *optional*, defaults to False):
|
| 654 |
Whether to apply normalization to the guidance, which can prevent oversaturation.
|
| 655 |
cfg_truncation (`float`, *optional*, defaults to 1.0):
|
| 656 |
A value between 0.0 and 1.0 that disables CFG for the final portion of the denoising steps,
|
| 657 |
-
|
| 658 |
negative_prompt (`str` or `List[str]`, *optional*):
|
| 659 |
The prompt or prompts not to guide the image generation.
|
| 660 |
num_images_per_prompt (`int`, *optional*, defaults to 1):
|
|
@@ -698,8 +710,12 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 698 |
is_two_stage_control_model = self.transformer.control_in_dim > self.transformer.in_channels if hasattr(self.transformer, "control_in_dim") else False
|
| 699 |
device = self._execution_device
|
| 700 |
dtype = self.transformer.dtype
|
| 701 |
-
vae_scale = self.vae_scale_factor * 2
|
| 702 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 703 |
ref_image = control_image or image
|
| 704 |
image_height = None
|
| 705 |
image_width = None
|
|
@@ -742,22 +758,23 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 742 |
prompt_embeds_model_input = prompt_embeds + negative_prompt_embeds
|
| 743 |
else:
|
| 744 |
prompt_embeds_model_input = prompt_embeds
|
| 745 |
-
|
| 746 |
-
|
| 747 |
-
is_img2img_mode = image is not None and not is_inpaint_mode
|
| 748 |
-
|
| 749 |
-
if control_image is not None or is_inpaint_mode:
|
| 750 |
control_latents = self.prepare_control_latents(control_image, width, height, batch_size, num_images_per_prompt, device, dtype)
|
| 751 |
|
| 752 |
-
if is_two_stage_control_model:
|
| 753 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 754 |
|
| 755 |
inpaint_latents = self._prepare_image_latents(
|
| 756 |
-
|
| 757 |
)
|
| 758 |
-
|
| 759 |
mask_latents = self._prepare_mask_latents(
|
| 760 |
-
|
| 761 |
width,
|
| 762 |
height,
|
| 763 |
batch_size,
|
|
@@ -765,6 +782,8 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 765 |
inpaint_latents.shape,
|
| 766 |
device,
|
| 767 |
dtype,
|
|
|
|
|
|
|
| 768 |
)
|
| 769 |
control_context = torch.cat([control_latents, mask_latents, inpaint_latents], dim=1)
|
| 770 |
else:
|
|
@@ -783,7 +802,7 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 783 |
timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, sigmas, mu=mu)
|
| 784 |
self._num_timesteps = len(timesteps)
|
| 785 |
|
| 786 |
-
if is_img2img_mode
|
| 787 |
strength = min(strength, 1.0)
|
| 788 |
else:
|
| 789 |
strength = 1.0
|
|
@@ -798,7 +817,8 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 798 |
|
| 799 |
latent_timestep = timesteps[:1].repeat(effective_batch_size) if strength < 1.0 else None
|
| 800 |
|
| 801 |
-
use_image_for_latents = is_img2img_mode
|
|
|
|
| 802 |
latents = self.prepare_latents(
|
| 803 |
effective_batch_size,
|
| 804 |
self.transformer.in_channels,
|
|
@@ -811,33 +831,78 @@ class ZImageControlUnifiedPipeline(DiffusionPipeline, ZImageLoraLoaderMixin, Fro
|
|
| 811 |
timestep=latent_timestep if use_image_for_latents else None,
|
| 812 |
latents=latents,
|
| 813 |
)
|
| 814 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 815 |
num_warmup_steps = len(timesteps) - num_steps_to_run * self.scheduler.order
|
| 816 |
with torch.inference_mode():
|
| 817 |
with self.progress_bar(total=num_steps_to_run) as progress_bar:
|
| 818 |
for i, t in enumerate(timesteps):
|
| 819 |
if self.interrupt:
|
| 820 |
continue
|
| 821 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 822 |
timestep = t.expand(latents.shape[0])
|
| 823 |
timestep = (1000 - timestep) / 1000
|
|
|
|
| 824 |
t_norm = timestep[0].item()
|
| 825 |
-
|
| 826 |
current_guidance_scale = self.guidance_scale
|
| 827 |
if self.do_classifier_free_guidance and self._cfg_truncation is not None and float(self._cfg_truncation) <= 1:
|
| 828 |
if t_norm > self._cfg_truncation:
|
| 829 |
current_guidance_scale = 0.0
|
| 830 |
-
|
| 831 |
apply_cfg = self.do_classifier_free_guidance and current_guidance_scale > 0
|
| 832 |
|
| 833 |
if apply_cfg:
|
| 834 |
-
|
| 835 |
-
latent_model_input = latents_typed.repeat(2, 1, 1, 1)
|
| 836 |
timestep_model_input = timestep.repeat(2)
|
| 837 |
else:
|
| 838 |
-
latent_model_input = latents
|
| 839 |
timestep_model_input = timestep
|
| 840 |
|
|
|
|
| 841 |
latent_model_input = latent_model_input.unsqueeze(2)
|
| 842 |
latent_model_input_list = list(latent_model_input.unbind(dim=0))
|
| 843 |
|
|
|
|
| 15 |
|
| 16 |
|
| 17 |
import inspect
|
| 18 |
+
from typing import Any, Callable, Dict, List, Literal, Optional, Tuple, Union
|
| 19 |
|
| 20 |
import numpy as np
|
| 21 |
import torch
|
| 22 |
import torch.nn.functional as F
|
| 23 |
+
import torchvision.transforms as T
|
| 24 |
from diffusers import AutoencoderKL, DiffusionPipeline, FlowMatchEulerDiscreteScheduler
|
| 25 |
from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
|
| 26 |
from diffusers.loaders import FromSingleFileMixin, ZImageLoraLoaderMixin
|
|
|
|
| 468 |
reference_latents_shape: Tuple,
|
| 469 |
device: torch.device,
|
| 470 |
dtype: torch.dtype,
|
| 471 |
+
invert_mask: bool = False,
|
| 472 |
+
do_unsqueeze: bool = True,
|
| 473 |
) -> torch.Tensor:
|
| 474 |
"""
|
| 475 |
Processes a MASK using the mask_processor, inverts it, resizes it, and formats it for the control_context.
|
|
|
|
| 497 |
)
|
| 498 |
return torch.zeros(placeholder_shape, device=device, dtype=dtype)
|
| 499 |
|
| 500 |
+
mask_tensor = self.mask_processor.preprocess(mask_image, height=height, width=width)
|
| 501 |
+
mask_tensor = mask_tensor.to(device=device, dtype=dtype)
|
| 502 |
+
|
| 503 |
+
if invert_mask:
|
| 504 |
+
mask_tensor = 1.0 - mask_tensor
|
| 505 |
|
| 506 |
+
mask_latents = F.interpolate(mask_tensor, size=reference_latents_shape[-2:], mode="nearest")
|
| 507 |
+
|
| 508 |
+
if do_unsqueeze:
|
| 509 |
+
mask_latents = mask_latents.unsqueeze(2)
|
| 510 |
+
|
| 511 |
+
return mask_latents
|
| 512 |
|
| 513 |
def prepare_control_latents(
|
| 514 |
self, image: PipelineImageInput, width: int, height: int, batch_size: int, num_images_per_prompt: int, device: torch.device, dtype: torch.dtype
|
|
|
|
| 603 |
prompt: Union[str, List[str]],
|
| 604 |
image: Optional[PipelineImageInput] = None,
|
| 605 |
mask_image: Optional[PipelineImageInput] = None,
|
| 606 |
+
inpaint_mode: Literal["default", "diff", "diff+inpaint"] = "default",
|
| 607 |
+
mask_blur_radius: float=8.0,
|
| 608 |
control_image: Optional[PipelineImageInput] = None,
|
| 609 |
height: Optional[int] = None,
|
| 610 |
width: Optional[int] = None,
|
|
|
|
| 639 |
The initial image for image-to-image or inpainting modes.
|
| 640 |
mask_image (`PipelineImageInput`, *optional*):
|
| 641 |
The mask image for inpainting. White areas are preserved, black areas are inpainted.
|
| 642 |
+
inpaint_mode (`str`, *optional*, defaults to `"default"`):
|
| 643 |
+
The inpainting mode. Can be "default", "diff", or "diff+inpaint". Determines how the inpainting
|
| 644 |
+
process is handled.
|
| 645 |
+
mask_blur_radius (`float`, *optional*, defaults to 8.0):
|
| 646 |
The radius for blurring the edges of the inpainting mask to create a smoother transition.
|
| 647 |
control_image (`PipelineImageInput`, *optional*):
|
| 648 |
The conditioning image for control modes (e.g., Canny, depth).
|
|
|
|
| 652 |
The width in pixels of the generated image.
|
| 653 |
num_inference_steps (`int`, *optional*, defaults to 20):
|
| 654 |
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
|
| 655 |
+
expense of slower inference.
|
| 656 |
sigmas (`List[float]`, *optional*):
|
| 657 |
Custom sigmas to use for the denoising process. If not defined, the scheduler's default behavior
|
| 658 |
+
will be used.
|
| 659 |
strength (`float`, *optional*, defaults to 1.0):
|
| 660 |
Denoising strength for image-to-image. A value of 1.0 means the initial image is fully replaced,
|
| 661 |
+
while a lower value preserves more of the original image structure. Only used in img2img mode.
|
| 662 |
guidance_scale (`float`, *optional*, defaults to 4.0):
|
| 663 |
The scale for classifier-free guidance. A value > 1 enables it. Higher values encourage images
|
| 664 |
+
closer to the prompt, potentially at the cost of quality.
|
| 665 |
cfg_normalization (`bool`, *optional*, defaults to False):
|
| 666 |
Whether to apply normalization to the guidance, which can prevent oversaturation.
|
| 667 |
cfg_truncation (`float`, *optional*, defaults to 1.0):
|
| 668 |
A value between 0.0 and 1.0 that disables CFG for the final portion of the denoising steps,
|
| 669 |
+
specified as a fraction of total steps. For example, 0.8 disables CFG for the last 20% of steps.
|
| 670 |
negative_prompt (`str` or `List[str]`, *optional*):
|
| 671 |
The prompt or prompts not to guide the image generation.
|
| 672 |
num_images_per_prompt (`int`, *optional*, defaults to 1):
|
|
|
|
| 710 |
is_two_stage_control_model = self.transformer.control_in_dim > self.transformer.in_channels if hasattr(self.transformer, "control_in_dim") else False
|
| 711 |
device = self._execution_device
|
| 712 |
dtype = self.transformer.dtype
|
| 713 |
+
vae_scale = self.vae_scale_factor * 2
|
| 714 |
+
has_inpaint_inputs = image is not None and mask_image is not None
|
| 715 |
+
is_inpaint_control_mode = has_inpaint_inputs and inpaint_mode in ["default", "diff+inpaint"]
|
| 716 |
+
is_diff_mode = has_inpaint_inputs and inpaint_mode in ["diff", "diff+inpaint"]
|
| 717 |
+
is_img2img_mode = image is not None and not has_inpaint_inputs
|
| 718 |
+
|
| 719 |
ref_image = control_image or image
|
| 720 |
image_height = None
|
| 721 |
image_width = None
|
|
|
|
| 758 |
prompt_embeds_model_input = prompt_embeds + negative_prompt_embeds
|
| 759 |
else:
|
| 760 |
prompt_embeds_model_input = prompt_embeds
|
| 761 |
+
|
| 762 |
+
if control_image is not None or is_inpaint_control_mode:
|
|
|
|
|
|
|
|
|
|
| 763 |
control_latents = self.prepare_control_latents(control_image, width, height, batch_size, num_images_per_prompt, device, dtype)
|
| 764 |
|
| 765 |
+
if is_two_stage_control_model:
|
| 766 |
+
image_for_inpaint = None if is_diff_mode and not is_inpaint_control_mode else image
|
| 767 |
+
mask_for_inpaint = None if is_diff_mode and not is_inpaint_control_mode else mask_image
|
| 768 |
+
|
| 769 |
+
if is_inpaint_control_mode:
|
| 770 |
+
mask_for_inpaint = self._apply_mask_blur(mask_for_inpaint, mask_blur_radius, True)
|
| 771 |
|
| 772 |
inpaint_latents = self._prepare_image_latents(
|
| 773 |
+
image_for_inpaint, mask_for_inpaint, width, height, batch_size, num_images_per_prompt, device, dtype
|
| 774 |
)
|
| 775 |
+
|
| 776 |
mask_latents = self._prepare_mask_latents(
|
| 777 |
+
mask_for_inpaint,
|
| 778 |
width,
|
| 779 |
height,
|
| 780 |
batch_size,
|
|
|
|
| 782 |
inpaint_latents.shape,
|
| 783 |
device,
|
| 784 |
dtype,
|
| 785 |
+
invert_mask=is_inpaint_control_mode,
|
| 786 |
+
do_unsqueeze=True,
|
| 787 |
)
|
| 788 |
control_context = torch.cat([control_latents, mask_latents, inpaint_latents], dim=1)
|
| 789 |
else:
|
|
|
|
| 802 |
timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, sigmas, mu=mu)
|
| 803 |
self._num_timesteps = len(timesteps)
|
| 804 |
|
| 805 |
+
if is_img2img_mode:
|
| 806 |
strength = min(strength, 1.0)
|
| 807 |
else:
|
| 808 |
strength = 1.0
|
|
|
|
| 817 |
|
| 818 |
latent_timestep = timesteps[:1].repeat(effective_batch_size) if strength < 1.0 else None
|
| 819 |
|
| 820 |
+
use_image_for_latents = is_img2img_mode
|
| 821 |
+
|
| 822 |
latents = self.prepare_latents(
|
| 823 |
effective_batch_size,
|
| 824 |
self.transformer.in_channels,
|
|
|
|
| 831 |
timestep=latent_timestep if use_image_for_latents else None,
|
| 832 |
latents=latents,
|
| 833 |
)
|
| 834 |
+
|
| 835 |
+
if is_diff_mode:
|
| 836 |
+
original_image_tensor = self.image_processor.preprocess(image, height=height, width=width).to(device=device, dtype=self.vae.dtype)
|
| 837 |
+
with torch.no_grad():
|
| 838 |
+
original_clean_latents = retrieve_latents(self.vae.encode(original_image_tensor), sample_mode="argmax")
|
| 839 |
+
original_clean_latents = (original_clean_latents - self.vae.config.shift_factor) * self.vae.config.scaling_factor
|
| 840 |
+
original_clean_latents = original_clean_latents.to(dtype)
|
| 841 |
+
|
| 842 |
+
noise = randn_tensor(original_clean_latents.shape, generator=generator, device=device, dtype=dtype)
|
| 843 |
+
latents_list = []
|
| 844 |
+
step_indices = [(self.scheduler.timesteps == t).nonzero().item() for t in timesteps]
|
| 845 |
+
for i in step_indices:
|
| 846 |
+
sigma = self.scheduler.sigmas[i]
|
| 847 |
+
noisy_latent = (1.0 - sigma) * original_clean_latents + sigma * noise
|
| 848 |
+
latents_list.append(noisy_latent)
|
| 849 |
+
|
| 850 |
+
original_latents_trajectory = torch.cat(latents_list, dim=0)
|
| 851 |
+
blurred_mask_image = self._apply_mask_blur(mask_image, mask_blur_radius, True)
|
| 852 |
+
map_processed = self._prepare_mask_latents(
|
| 853 |
+
blurred_mask_image,
|
| 854 |
+
width,
|
| 855 |
+
height,
|
| 856 |
+
batch_size,
|
| 857 |
+
num_images_per_prompt,
|
| 858 |
+
latents.shape,
|
| 859 |
+
device,
|
| 860 |
+
dtype,
|
| 861 |
+
invert_mask=True,
|
| 862 |
+
do_unsqueeze=False,
|
| 863 |
+
)
|
| 864 |
+
|
| 865 |
+
thresholds = torch.arange(len(timesteps), device=device, dtype=dtype) / len(timesteps)
|
| 866 |
+
thresholds = thresholds.view(-1, 1, 1, 1)
|
| 867 |
+
time_masks = map_processed > thresholds
|
| 868 |
+
|
| 869 |
num_warmup_steps = len(timesteps) - num_steps_to_run * self.scheduler.order
|
| 870 |
with torch.inference_mode():
|
| 871 |
with self.progress_bar(total=num_steps_to_run) as progress_bar:
|
| 872 |
for i, t in enumerate(timesteps):
|
| 873 |
if self.interrupt:
|
| 874 |
continue
|
| 875 |
+
|
| 876 |
+
if is_diff_mode:
|
| 877 |
+
if i == 0:
|
| 878 |
+
latents = original_latents_trajectory[:1]
|
| 879 |
+
else:
|
| 880 |
+
current_mask = time_masks[i].to(latents.dtype)
|
| 881 |
+
current_original_latent = original_latents_trajectory[i:i+1]
|
| 882 |
+
|
| 883 |
+
if current_mask.ndim == 3:
|
| 884 |
+
current_mask = current_mask.unsqueeze(1)
|
| 885 |
+
|
| 886 |
+
latents = current_original_latent * current_mask + latents * (1 - current_mask)
|
| 887 |
+
|
| 888 |
timestep = t.expand(latents.shape[0])
|
| 889 |
timestep = (1000 - timestep) / 1000
|
| 890 |
+
|
| 891 |
t_norm = timestep[0].item()
|
|
|
|
| 892 |
current_guidance_scale = self.guidance_scale
|
| 893 |
if self.do_classifier_free_guidance and self._cfg_truncation is not None and float(self._cfg_truncation) <= 1:
|
| 894 |
if t_norm > self._cfg_truncation:
|
| 895 |
current_guidance_scale = 0.0
|
|
|
|
| 896 |
apply_cfg = self.do_classifier_free_guidance and current_guidance_scale > 0
|
| 897 |
|
| 898 |
if apply_cfg:
|
| 899 |
+
latent_model_input = latents.repeat(2, 1, 1, 1)
|
|
|
|
| 900 |
timestep_model_input = timestep.repeat(2)
|
| 901 |
else:
|
| 902 |
+
latent_model_input = latents
|
| 903 |
timestep_model_input = timestep
|
| 904 |
|
| 905 |
+
latent_model_input = latent_model_input.to(self.transformer.dtype)
|
| 906 |
latent_model_input = latent_model_input.unsqueeze(2)
|
| 907 |
latent_model_input_list = list(latent_model_input.unbind(dim=0))
|
| 908 |
|
infer_inpaint.py
CHANGED
|
@@ -11,16 +11,14 @@ from diffusers_local import patch # Apply necessary patches for local diffusers
|
|
| 11 |
from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline
|
| 12 |
from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel
|
| 13 |
|
| 14 |
-
|
| 15 |
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True,garbage_collection_threshold:0.7,max_split_size_mb:1024"
|
| 16 |
|
| 17 |
-
|
| 18 |
def main():
|
| 19 |
# 1. Set params
|
| 20 |
-
BASE_MODEL_ID = "."
|
| 21 |
GGUF_MODEL_FILE = "./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf"
|
| 22 |
GGUF_MODEL_FILE = "./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf"
|
| 23 |
-
|
| 24 |
use_gguf = True
|
| 25 |
|
| 26 |
# prompt="一位年轻女子站在阳光明媚的海岸线上,白裙在轻拂的海风中微微飘动,裙摆轻盈飞扬。她拥有一头鲜艳的紫色长发,在风中轻盈舞动,发间系着一个精致的黑色蝴蝶结,与身后柔和的蔚蓝天空形成鲜明对比。她面容清秀,眉目精致,肤色白皙细腻,透着一股甜美的青春气息;神情柔和,略带羞涩,目光静静地凝望着远方的地平线,双手自然交叠于身前,手指清晰可见、五指完整、指节自然、姿势优雅放松,仿佛沉浸在思绪之中。背景是辽阔无垠、波光粼粼的大海,阳光洒在海面上,映出温暖的金色光晕,海浪轻轻拍打沙滩,天空湛蓝云朵稀薄。整体画面高清锐利、细节丰富、色彩鲜艳、焦点清晰、8K分辨率、杰作、最佳质量、无模糊、无噪点、无畸变、自然光照、电影级渲染。"
|
|
@@ -29,13 +27,14 @@ def main():
|
|
| 29 |
|
| 30 |
target_height = 1728
|
| 31 |
target_width = 992
|
| 32 |
-
num_inference_steps =
|
| 33 |
guidance_scale = 0 # 2.5
|
| 34 |
controlnet_conditioning_scale = 0.7
|
| 35 |
controlnet_conditioning_refiner_scale = 0.75
|
| 36 |
-
mask_blur_radius =
|
| 37 |
-
seed =
|
| 38 |
shift = 3.0
|
|
|
|
| 39 |
generator = torch.Generator("cuda").manual_seed(seed)
|
| 40 |
|
| 41 |
print("Loading Pipeline...")
|
|
@@ -74,8 +73,7 @@ def main():
|
|
| 74 |
|
| 75 |
pose_image = load_image("assets/pose.jpg")
|
| 76 |
inpaint_image = load_image("assets/inpaint.jpg")
|
| 77 |
-
mask_image = load_image("assets/
|
| 78 |
-
|
| 79 |
start_inference_time = time.time()
|
| 80 |
|
| 81 |
generated_image = pipe(
|
|
@@ -84,7 +82,8 @@ def main():
|
|
| 84 |
image=inpaint_image,
|
| 85 |
control_image=pose_image,
|
| 86 |
mask_image=mask_image,
|
| 87 |
-
mask_blur_radius=mask_blur_radius,
|
|
|
|
| 88 |
height=target_height,
|
| 89 |
width=target_width,
|
| 90 |
num_inference_steps=num_inference_steps,
|
|
|
|
| 11 |
from diffusers_local.pipeline_z_image_control_unified import ZImageControlUnifiedPipeline
|
| 12 |
from diffusers_local.z_image_control_transformer_2d import ZImageControlTransformer2DModel
|
| 13 |
|
|
|
|
| 14 |
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True,garbage_collection_threshold:0.7,max_split_size_mb:1024"
|
| 15 |
|
|
|
|
| 16 |
def main():
|
| 17 |
# 1. Set params
|
| 18 |
+
BASE_MODEL_ID = "."
|
| 19 |
GGUF_MODEL_FILE = "./transformer/z_image_turbo_control_unified_v2.1_q4_k_m.gguf"
|
| 20 |
GGUF_MODEL_FILE = "./transformer/z_image_turbo_control_unified_v2.1_q8_0.gguf"
|
| 21 |
+
|
| 22 |
use_gguf = True
|
| 23 |
|
| 24 |
# prompt="一位年轻女子站在阳光明媚的海岸线上,白裙在轻拂的海风中微微飘动,裙摆轻盈飞扬。她拥有一头鲜艳的紫色长发,在风中轻盈舞动,发间系着一个精致的黑色蝴蝶结,与身后柔和的蔚蓝天空形成鲜明对比。她面容清秀,眉目精致,肤色白皙细腻,透着一股甜美的青春气息;神情柔和,略带羞涩,目光静静地凝望着远方的地平线,双手自然交叠于身前,手指清晰可见、五指完整、指节自然、姿势优雅放松,仿佛沉浸在思绪之中。背景是辽阔无垠、波光粼粼的大海,阳光洒在海面上,映出温暖的金色光晕,海浪轻轻拍打沙滩,天空湛蓝云朵稀薄。整体画面高清锐利、细节丰富、色彩鲜艳、焦点清晰、8K分辨率、杰作、最佳质量、无模糊、无噪点、无畸变、自然光照、电影级渲染。"
|
|
|
|
| 27 |
|
| 28 |
target_height = 1728
|
| 29 |
target_width = 992
|
| 30 |
+
num_inference_steps = 25
|
| 31 |
guidance_scale = 0 # 2.5
|
| 32 |
controlnet_conditioning_scale = 0.7
|
| 33 |
controlnet_conditioning_refiner_scale = 0.75
|
| 34 |
+
mask_blur_radius = 12
|
| 35 |
+
seed = 48
|
| 36 |
shift = 3.0
|
| 37 |
+
inpaint_mode = "diff+inpaint" # ("default", "diff", "diff+inpaint")
|
| 38 |
generator = torch.Generator("cuda").manual_seed(seed)
|
| 39 |
|
| 40 |
print("Loading Pipeline...")
|
|
|
|
| 73 |
|
| 74 |
pose_image = load_image("assets/pose.jpg")
|
| 75 |
inpaint_image = load_image("assets/inpaint.jpg")
|
| 76 |
+
mask_image = load_image("assets/inpaint_mask.jpg")
|
|
|
|
| 77 |
start_inference_time = time.time()
|
| 78 |
|
| 79 |
generated_image = pipe(
|
|
|
|
| 82 |
image=inpaint_image,
|
| 83 |
control_image=pose_image,
|
| 84 |
mask_image=mask_image,
|
| 85 |
+
mask_blur_radius=mask_blur_radius,
|
| 86 |
+
inpaint_mode=inpaint_mode,
|
| 87 |
height=target_height,
|
| 88 |
width=target_width,
|
| 89 |
num_inference_steps=num_inference_steps,
|
prepare_mask.py
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import argparse
|
| 2 |
+
from PIL import Image, ImageFilter
|
| 3 |
+
|
| 4 |
+
def expand_and_feather_mask(mask_image: Image.Image, expand_pixels: int = 10, feather_radius: int = 8) -> Image.Image:
|
| 5 |
+
"""
|
| 6 |
+
Expands the white area of a mask and then smooths its edges using Pillow filters.
|
| 7 |
+
|
| 8 |
+
This is useful for preparing inpainting masks to ensure complete coverage of the
|
| 9 |
+
area to be replaced and to create a smooth blend with the surrounding image.
|
| 10 |
+
|
| 11 |
+
Args:
|
| 12 |
+
mask_image (PIL.Image.Image): The input mask (black and white). It's
|
| 13 |
+
expected to be a PIL Image.
|
| 14 |
+
expand_pixels (int): The number of pixels to expand (dilate) the white
|
| 15 |
+
area. This helps to cover any "ghosting" from the old image.
|
| 16 |
+
feather_radius (int): The radius of the Gaussian blur used to create the
|
| 17 |
+
soft edge (feathering) effect.
|
| 18 |
+
|
| 19 |
+
Returns:
|
| 20 |
+
PIL.Image.Image: The processed mask with expanded and feathered edges.
|
| 21 |
+
"""
|
| 22 |
+
# Ensure the mask is in 'L' (grayscale) mode for the filters to work correctly.
|
| 23 |
+
mask = mask_image.convert("L")
|
| 24 |
+
|
| 25 |
+
# 1. Expansion (Dilation)
|
| 26 |
+
# The MaxFilter finds the brightest pixel in a kernel window and replaces the
|
| 27 |
+
# center pixel with it. For a black and white image, this causes the white
|
| 28 |
+
# areas to expand.
|
| 29 |
+
if expand_pixels > 0:
|
| 30 |
+
# The filter size must be an odd number. The formula (pixels * 2 + 1)
|
| 31 |
+
# creates a kernel of the correct odd size.
|
| 32 |
+
expand_size = expand_pixels * 2 + 1
|
| 33 |
+
print(f"Expanding mask by {expand_pixels} pixels (filter size: {expand_size}x{expand_size})...")
|
| 34 |
+
mask = mask.filter(ImageFilter.MaxFilter(size=expand_size))
|
| 35 |
+
|
| 36 |
+
# 2. Feathering (Gaussian Blur)
|
| 37 |
+
# Applies a Gaussian blur to the expanded mask, creating a smooth
|
| 38 |
+
# gradient from white to black at the edges.
|
| 39 |
+
if feather_radius > 0:
|
| 40 |
+
print(f"Feathering mask with a radius of {feather_radius} pixels...")
|
| 41 |
+
mask = mask.filter(ImageFilter.GaussianBlur(radius=feather_radius))
|
| 42 |
+
|
| 43 |
+
return mask
|
| 44 |
+
|
| 45 |
+
def main():
|
| 46 |
+
"""Main function to parse arguments and process the mask."""
|
| 47 |
+
parser = argparse.ArgumentParser(description="Expand and feather an inpainting mask.")
|
| 48 |
+
|
| 49 |
+
parser.add_argument(
|
| 50 |
+
"input_path",
|
| 51 |
+
type=str,
|
| 52 |
+
help="Path to the input mask image file."
|
| 53 |
+
)
|
| 54 |
+
parser.add_argument(
|
| 55 |
+
"output_path",
|
| 56 |
+
type=str,
|
| 57 |
+
help="Path to save the processed output mask image file."
|
| 58 |
+
)
|
| 59 |
+
parser.add_argument(
|
| 60 |
+
"--expand",
|
| 61 |
+
type=int,
|
| 62 |
+
default=10,
|
| 63 |
+
help="Number of pixels to expand the white areas of the mask. Default is 10."
|
| 64 |
+
)
|
| 65 |
+
parser.add_argument(
|
| 66 |
+
"--feather",
|
| 67 |
+
type=int,
|
| 68 |
+
default=8,
|
| 69 |
+
help="Radius in pixels for the Gaussian blur (feathering) effect. Default is 8."
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
args = parser.parse_args()
|
| 73 |
+
|
| 74 |
+
try:
|
| 75 |
+
# Load the input mask
|
| 76 |
+
print(f"Loading mask from: {args.input_path}")
|
| 77 |
+
original_mask = Image.open(args.input_path)
|
| 78 |
+
except FileNotFoundError:
|
| 79 |
+
print(f"Error: Input file not found at '{args.input_path}'")
|
| 80 |
+
return
|
| 81 |
+
except Exception as e:
|
| 82 |
+
print(f"Error loading image: {e}")
|
| 83 |
+
return
|
| 84 |
+
|
| 85 |
+
# Process the mask using the function
|
| 86 |
+
processed_mask = expand_and_feather_mask(
|
| 87 |
+
original_mask,
|
| 88 |
+
expand_pixels=args.expand,
|
| 89 |
+
feather_radius=args.feather
|
| 90 |
+
)
|
| 91 |
+
|
| 92 |
+
# Save the final mask
|
| 93 |
+
try:
|
| 94 |
+
print(f"Saving processed mask to: {args.output_path}")
|
| 95 |
+
processed_mask.save(args.output_path)
|
| 96 |
+
print("Done!")
|
| 97 |
+
except Exception as e:
|
| 98 |
+
print(f"Error saving image: {e}")
|
| 99 |
+
|
| 100 |
+
if __name__ == "__main__":
|
| 101 |
+
main()
|
results/new_tests/{result_inpaint.png → result_inpaint_2.png}
RENAMED
|
File without changes
|
results/new_tests/result_inpaint_default.png
ADDED
|
Git LFS Details
|
results/new_tests/result_inpaint_diff.png
ADDED
|
Git LFS Details
|
results/new_tests/result_inpaint_diffinpaint.png
ADDED
|
Git LFS Details
|