Instructions to use xinsir/controlnet-canny-sdxl-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use xinsir/controlnet-canny-sdxl-1.0 with Diffusers:
pip install -U diffusers transformers accelerate
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline controlnet = ControlNetModel.from_pretrained("xinsir/controlnet-canny-sdxl-1.0") pipe = StableDiffusionControlNetPipeline.from_pretrained( "fill-in-base-model", controlnet=controlnet ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Update README.md
Browse files
README.md
CHANGED
|
@@ -88,6 +88,24 @@ import torch
|
|
| 88 |
import numpy as np
|
| 89 |
import cv2
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
controlnet_conditioning_scale = 1.0
|
| 92 |
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
|
| 93 |
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
|
|
@@ -143,7 +161,10 @@ images[0].save(f"your image save path, png format is usually better than jpg or
|
|
| 143 |
|
| 144 |
## Training Details
|
| 145 |
|
| 146 |
-
The model is trained using high quality data, only 1 stage training
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
|
| 149 |
### Training Data
|
|
|
|
| 88 |
import numpy as np
|
| 89 |
import cv2
|
| 90 |
|
| 91 |
+
def HWC3(x):
|
| 92 |
+
assert x.dtype == np.uint8
|
| 93 |
+
if x.ndim == 2:
|
| 94 |
+
x = x[:, :, None]
|
| 95 |
+
assert x.ndim == 3
|
| 96 |
+
H, W, C = x.shape
|
| 97 |
+
assert C == 1 or C == 3 or C == 4
|
| 98 |
+
if C == 3:
|
| 99 |
+
return x
|
| 100 |
+
if C == 1:
|
| 101 |
+
return np.concatenate([x, x, x], axis=2)
|
| 102 |
+
if C == 4:
|
| 103 |
+
color = x[:, :, 0:3].astype(np.float32)
|
| 104 |
+
alpha = x[:, :, 3:4].astype(np.float32) / 255.0
|
| 105 |
+
y = color * alpha + 255.0 * (1.0 - alpha)
|
| 106 |
+
y = y.clip(0, 255).astype(np.uint8)
|
| 107 |
+
return y
|
| 108 |
+
|
| 109 |
controlnet_conditioning_scale = 1.0
|
| 110 |
prompt = "your prompt, the longer the better, you can describe it as detail as possible"
|
| 111 |
negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality'
|
|
|
|
| 161 |
|
| 162 |
## Training Details
|
| 163 |
|
| 164 |
+
The model is trained using high quality data, only 1 stage training, the resolution setting is the same with sdxl-base, 1024*1024. We use random threshold to generate canny images like lvming zhang, It is essential to find proper hyerparameters
|
| 165 |
+
to realize data augmentation, too easy or too hard will hurt the model performance. Besides, we use random mask to random mask out a random percentage of canny images to force the model to learn more semantic meaning between the prompt and the line.
|
| 166 |
+
We use over 10000000 images, which are annotated carefully, cogvlm is proved to be a powerful image caption model[https://github.com/THUDM/CogVLM?tab=readme-ov-file]. For comic images, it is recommened to use waifu tagger to generate special tags
|
| 167 |
+
[https://huggingface.co/spaces/SmilingWolf/wd-tagger]. More than 64 A100s are used to train the model and the real batch size is 2560 when used accumulate_grad_batches.
|
| 168 |
|
| 169 |
|
| 170 |
### Training Data
|