Instructions to use xinsir/controlnet-canny-sdxl-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use xinsir/controlnet-canny-sdxl-1.0 with Diffusers:
pip install -U diffusers transformers accelerate
from diffusers import ControlNetModel, StableDiffusionControlNetPipeline controlnet = ControlNetModel.from_pretrained("xinsir/controlnet-canny-sdxl-1.0") pipe = StableDiffusionControlNetPipeline.from_pretrained( "fill-in-base-model", controlnet=controlnet ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| tags: | |
| - text_to_image | |
| - diffusers | |
| - controlnet | |
| - controlnet-canny-sdxl-1.0 | |
| # Drawing like Midjourney | |
|  | |
| # Controlnet-Canny-Sdxl-1.0 | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| Hello, I am very happy to announce the controlnet-canny-sdxl-1.0 model, a very powerful controlnet that can generate high resolution images visually comparable with midjourney. | |
| The model was trained with large amount of high quality data(over 10000000 images), with carefully filtered and captioned(powerful vllm model). Besides, useful tricks are applied | |
| during the training, including date augmentation, mutiple loss and multi resolution. With only 1 stage training, the performance outperforms the other opensource canny models | |
| ([diffusers/controlnet-canny-sdxl-1.0], [TheMistoAI/MistoLine]). I release it and hope to advance the application of stable diffusion models. Canny is one of the most important | |
| ControlNet series models and can be applied to many jobs associated with drawing and designing. | |
| ## Model Details | |
| ### Model Description | |
| <!-- Provide a longer summary of what this model is. --> | |
| - **Developed by:** xinsir | |
| - **Model type:** ControlNet_SDXL | |
| - **License:** apache-2.0 | |
| - **Finetuned from model [optional]:** stabilityai/stable-diffusion-xl-base-1.0 | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Paper [optional]:** https://arxiv.org/abs/2302.05543 | |
| ## Uses | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| ### Examples | |
| prompt: A closeup of two day of the dead models, looking to the side, large flowered headdress, full dia de Los muertoe make up, lush red lips, butterflies, | |
| flowers, pastel colors, looking to the side, jungle, birds, color harmony , extremely detailed, intricate, ornate, motion, stunning, beautiful, unique, soft lighting | |
|  | |
|  | |
| prompt: ghost with a plague doctor mask in a venice carnaval hyper realistic | |
|  | |
|  | |
| prompt: A picture surrounded by blue stars and gold stars, glowing, dark navy blue and gray tones, distributed in light silver and gold, playful, festive atmosphere, pure fabric, chalk, FHD 8K | |
|  | |
|  | |
| prompt: Delicious vegetarian pizza with champignon mushrooms, tomatoes, mozzarella, peppers and black olives, isolated on white background , transparent isolated white background , top down view, studio photo, transparent png, Clean sharp focus. High end retouching. Food magazine photography. Award winning photography. Advertising photography. Commercial photography | |
|  | |
|  | |
| prompt: a blonde woman in a wedding dress in a maple forest in summer with a flower crown laurel. Watercolor painting in the style of John William Waterhouse. Romanticism. Ethereal light. | |
|  | |
|  | |
| ### Examples Anime(Note that you need to change the base model to CounterfeitXL, others remains the same) | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| ```python | |
| from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL | |
| from diffusers import DDIMScheduler, EulerAncestralDiscreteScheduler | |
| from PIL import Image | |
| import torch | |
| import numpy as np | |
| import cv2 | |
| def HWC3(x): | |
| assert x.dtype == np.uint8 | |
| if x.ndim == 2: | |
| x = x[:, :, None] | |
| assert x.ndim == 3 | |
| H, W, C = x.shape | |
| assert C == 1 or C == 3 or C == 4 | |
| if C == 3: | |
| return x | |
| if C == 1: | |
| return np.concatenate([x, x, x], axis=2) | |
| if C == 4: | |
| color = x[:, :, 0:3].astype(np.float32) | |
| alpha = x[:, :, 3:4].astype(np.float32) / 255.0 | |
| y = color * alpha + 255.0 * (1.0 - alpha) | |
| y = y.clip(0, 255).astype(np.uint8) | |
| return y | |
| controlnet_conditioning_scale = 1.0 | |
| prompt = "your prompt, the longer the better, you can describe it as detail as possible" | |
| negative_prompt = 'longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality' | |
| eulera_scheduler = EulerAncestralDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler") | |
| controlnet = ControlNetModel.from_pretrained( | |
| "xinsir/controlnet-canny-sdxl-1.0", | |
| torch_dtype=torch.float16 | |
| ) | |
| # when test with other base model, you need to change the vae also. | |
| vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16) | |
| pipe = StableDiffusionXLControlNetPipeline.from_pretrained( | |
| "stabilityai/stable-diffusion-xl-base-1.0", | |
| controlnet=controlnet, | |
| vae=vae, | |
| safety_checker=None, | |
| torch_dtype=torch.float16, | |
| scheduler=eulera_scheduler, | |
| ) | |
| # need to resize the image resolution to 1024 * 1024 or same bucket resolution to get the best performance | |
| controlnet_img = cv2.imread("your image path") | |
| height, width, _ = controlnet_img.shape | |
| ratio = np.sqrt(1024. * 1024. / (width * height)) | |
| new_width, new_height = int(width * ratio), int(height * ratio) | |
| controlnet_img = cv2.resize(controlnet_img, (new_width, new_height)) | |
| controlnet_img = cv2.Canny(controlnet_img, 100, 200) | |
| controlnet_img = HWC3(controlnet_img) | |
| controlnet_img = Image.fromarray(controlnet_img) | |
| images = pipe( | |
| prompt, | |
| negative_prompt=negative_prompt, | |
| image=controlnet_img, | |
| controlnet_conditioning_scale=controlnet_conditioning_scale, | |
| width=new_width, | |
| height=new_height, | |
| num_inference_steps=30, | |
| ).images | |
| images[0].save(f"your image save path, png format is usually better than jpg or webp in terms of image quality but got much bigger") | |
| ``` | |
| ## Training Details | |
| The model is trained using high quality data, only 1 stage training, the resolution setting is the same with sdxl-base, 1024*1024. We use random threshold to generate canny images like lvming zhang, It is essential to find proper hyerparameters | |
| to realize data augmentation, too easy or too hard will hurt the model performance. Besides, we use random mask to random mask out a random percentage of canny images to force the model to learn more semantic meaning between the prompt and the line. | |
| We use over 10000000 images, which are annotated carefully, cogvlm is proved to be a powerful image caption model[https://github.com/THUDM/CogVLM?tab=readme-ov-file]. For comic images, it is recommened to use waifu tagger to generate special tags | |
| [https://huggingface.co/spaces/SmilingWolf/wd-tagger]. More than 64 A100s are used to train the model and the real batch size is 2560 when used accumulate_grad_batches. | |
| ### Training Data | |
| <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| The data consists of many sources, including midjourney, laion 5B, danbooru, and so on. The data is carefully filtered and annotated. | |
| ### Evaluation | |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | |
| In our evaluation, the model got better aesthetic score in real images compared with stabilityai/stable-diffusion-xl-base-1.0, and comparable performance in cartoon sytle images. | |
| The model is better in control ability when test with perception similarity due to more strong data augmentation and more training steps. | |
| Besides, the model has lower rate to generate abnormal images which tend to include some abnormal human structure. | |