0 Bytes
31 files
Updated 2 days ago
Name
Size
metal
original
.gitattributes1.52 kB
xet
LICENSE11.4 kB
xet
README.md3.17 kB
xet
USAGE_POLICY201 Bytes
xet
chat_template.jinja16.7 kB
xet
config.json607 Bytes
xet
diffusion_pytorch_model.bin335 MB
xet
diffusion_pytorch_model.safetensors335 MB
xet
generation_config.json177 Bytes
xet
model-00000-of-00014.safetensors4.63 GB
xet
model-00001-of-00014.safetensors4.12 GB
xet
model-00002-of-00014.safetensors4.63 GB
xet
model-00003-of-00014.safetensors4.12 GB
xet
model-00004-of-00014.safetensors4.63 GB
xet
model-00005-of-00014.safetensors4.12 GB
xet
model-00006-of-00014.safetensors4.63 GB
xet
model-00007-of-00014.safetensors4.06 GB
xet
model-00008-of-00014.safetensors4.63 GB
xet
model-00009-of-00014.safetensors4.17 GB
xet
model-00010-of-00014.safetensors4.63 GB
xet
model-00011-of-00014.safetensors4.12 GB
xet
model-00012-of-00014.safetensors4.06 GB
xet
model-00013-of-00014.safetensors4.63 GB
xet
model-00014-of-00014.safetensors4.12 GB
xet
model.safetensors.index.json54.5 kB
xet
sdxl_vae.safetensors335 MB
xet
special_tokens_map.json98 Bytes
xet
tokenizer.json27.9 MB
xet
tokenizer_config.json4.2 kB
xet
README.md

SDXL - VAE

How to use with ๐Ÿงจ diffusers

You can integrate this fine-tuned VAE decoder to your existing diffusers workflows, by including a vae argument to the StableDiffusionPipeline

from diffusers.models import AutoencoderKL
from diffusers import StableDiffusionPipeline

model = "stabilityai/your-stable-diffusion-model"
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")
pipe = StableDiffusionPipeline.from_pretrained(model, vae=vae)

Model

SDXL is a latent diffusion model, where the diffusion operates in a pretrained, learned (and fixed) latent space of an autoencoder. While the bulk of the semantic composition is done by the latent diffusion model, we can improve local, high-frequency details in generated images by improving the quality of the autoencoder. To this end, we train the same autoencoder architecture used for the original Stable Diffusion at a larger batch-size (256 vs 9) and additionally track the weights with an exponential moving average (EMA). The resulting autoencoder outperforms the original model in all evaluated reconstruction metrics, see the table below.

Evaluation

SDXL-VAE vs original kl-f8 VAE vs f8-ft-MSE

COCO 2017 (256x256, val, 5000 images)

Model rFID PSNR SSIM PSIM Link Comments
SDXL-VAE 4.42 24.7 +/- 3.9 0.73 +/- 0.13 0.88 +/- 0.27 https://huggingface.co/stabilityai/sdxl-vae/blob/main/sdxl_vae.safetensors as used in SDXL
original 4.99 23.4 +/- 3.8 0.69 +/- 0.14 1.01 +/- 0.28 https://ommer-lab.com/files/latent-diffusion/kl-f8.zip as used in SD
ft-MSE 4.70 24.5 +/- 3.7 0.71 +/- 0.13 0.92 +/- 0.27 https://huggingface.co/stabilityai/sd-vae-ft-mse-original/resolve/main/vae-ft-mse-840000-ema-pruned.ckpt resumed with EMA from ft-EMA, emphasis on MSE (rec. loss = MSE + 0.1 * LPIPS), smoother outputs
Total size
0 Bytes
Files
31
Last updated
Jun 20
Pre-warmed CDN
US EU US EU

Contributors