Text-to-Image
Diffusers
Safetensors
StableDiffusionXLPipeline
sdxl
quantization
svdquant
nunchaku
fp4
int4
Instructions to use tonera/fucktastic25DCheckpointPony_30 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use tonera/fucktastic25DCheckpointPony_30 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("tonera/fucktastic25DCheckpointPony_30", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| pipeline_tag: text-to-image | |
| library_name: diffusers | |
| tags: | |
| - sdxl | |
| - quantization | |
| - svdquant | |
| - nunchaku | |
| - fp4 | |
| - int4 | |
| base_model: tonera/fucktastic25DCheckpointPony_30 | |
| base_model_relation: quantized | |
| license: apache-2.0 | |
| # 模型说明(SVDQuant) | |
| > **文档语言**:中文|[English](README.md) | |
| ## 模型名称 | |
| - **模型仓库**:`tonera/fucktastic25DCheckpointPony_30` | |
| - **Base(Diffusers 权重路径)**:`tonera/fucktastic25DCheckpointPony_30`(本仓库根目录) | |
| - **量化 UNet 权重**:`tonera/fucktastic25DCheckpointPony_30/svdq-<precision>_r32-fucktastic25DCheckpointPony_30.safetensors` | |
| ## 量化 / 推理技术 | |
| - **推理引擎**:Nunchaku(`https://github.com/nunchaku-ai/nunchaku`) | |
| Nunchaku 是一个面向 **4-bit(FP4/INT4)低比特神经网络**的高性能推理引擎,核心目标是在尽量保持生成质量的同时显著降低显存占用并提升推理速度。它实现并工程化了 **SVDQuant** 等后训练量化方案,并通过算子/内核融合等优化减少低秩分支带来的额外开销。 | |
| 本模型仓库中的 SDXL 量化权重(例如 `svdq-*_r32-*.safetensors`)用于配合 Nunchaku,在支持的 GPU 上进行高效推理。 | |
| ## 量化质量(fp8) | |
| ```text | |
| PSNR: mean=15.8524 p50=15.7576 p90=18.4809 best=20.5972 worst=11.6408 (N=25) | |
| SSIM: mean=0.644562 p50=0.656934 p90=0.748802 best=0.814231 worst=0.374156 (N=25) | |
| LPIPS: mean=0.321057 p50=0.318117 p90=0.497596 best=0.124587 worst=0.539914 (N=25) | |
| ``` | |
| ## 性能提升 | |
| 以下为推理性能对比结果(Diffusers vs Nunchaku-UNet)。 | |
| - **推理配置**:`bf16 / steps=30 / guidance_scale=5.0` | |
| - **分辨率(各 5 张,batch=5)**:`1024x1024`, `1024x768`, `768x1024`, `832x1216`, `1216x832` | |
| - **软件版本**:`torch 2.9` / `cuda 12.8` / `nunchaku 1.1.0+torch2.9` / `diffusers 0.37.0.dev0` | |
| - **优化开关**:无 `torch.compile`,无显式 `cudnn` 优化开关 | |
| ### 冷启动性能对比(首张图端到端) | |
| | GPU | 指标 | Diffusers | Nunchaku | 加速比 | 提升 | | |
| |-----|------|-----------|----------|--------|------| | |
| | RTX 5090 | load | 3.505s | 3.432s | 1.02x | +2.1% | | |
| | RTX 5090 | cold_infer | 2.944s | 2.447s | 1.20x | +16.9% | | |
| | RTX 5090 | cold_e2e | 6.449s | 5.880s | 1.10x | +8.8% | | |
| | RTX 3090 | load | 3.787s | 3.442s | 1.10x | +9.1% | | |
| | RTX 3090 | cold_infer | 7.503s | 5.231s | 1.43x | +30.3% | | |
| | RTX 3090 | cold_e2e | 11.290s | 8.673s | 1.30x | +23.2% | | |
| ### Warmup 后连续 5 张性能对比 | |
| | GPU | 指标 | Diffusers | Nunchaku | 加速比 | 提升 | | |
| |-----|------|-----------|----------|--------|------| | |
| | RTX 5090 | total (5张) | 12.937s | 9.813s | 1.32x | +24.2% | | |
| | RTX 5090 | avg (单张) | 2.587s | 1.963s | 1.32x | +24.2% | | |
| | RTX 3090 | total (5张) | 33.413s | 22.975s | 1.45x | +31.2% | | |
| | RTX 3090 | avg (单张) | 6.683s | 4.595s | 1.45x | +31.2% | | |
| **说明**: | |
| - RTX 3090 的 load 时间较长是因为首次加载量化权重需要额外处理时间 | |
| - 在推理阶段(cold_infer 和 warmup 后),Nunchaku 在两张显卡上均表现出明显的加速效果 | |
| ## 使用前必须安装 Nunchaku | |
| - **官方安装文档**(建议以此为准):`https://nunchaku.tech/docs/nunchaku/installation/installation.html` | |
| ### (推荐)方式:安装官方预编译 Wheel | |
| - **前置条件**:安装 `PyTorch >= 2.5`(实际以对应 wheel 的要求为准) | |
| - **安装 nunchaku wheel**:从 GitHub Releases / HuggingFace / ModelScope 选择与你环境匹配的 wheel(注意 `cp311` 表示 Python 3.11): | |
| - `https://github.com/nunchaku-ai/nunchaku/releases` | |
| ```bash | |
| # 示例(请按你的 torch/cuda/python 版本选择正确的 wheel URL) | |
| pip install https://github.com/nunchaku-ai/nunchaku/releases/download/vX.Y.Z/nunchaku-X.Y.Z+torch2.9-cp311-cp311-linux_x86_64.whl | |
| ``` | |
| - **提示(50 系 GPU)**:通常建议 `CUDA >= 12.8`,并优先使用 FP4 模型以获得更好的兼容性与性能(以官方文档为准)。 | |
| ## 使用示例(Diffusers + Nunchaku UNet) | |
| ```python | |
| import torch | |
| from diffusers import StableDiffusionXLPipeline | |
| from nunchaku.models.unets.unet_sdxl import NunchakuSDXLUNet2DConditionModel | |
| from nunchaku.utils import get_precision | |
| MODEL = "fucktastic25DCheckpointPony_30" | |
| REPO_ID = f"tonera/{MODEL}" | |
| if __name__ == "__main__": | |
| unet = NunchakuSDXLUNet2DConditionModel.from_pretrained( | |
| f"{REPO_ID}/svdq-{get_precision()}_r32-{MODEL}.safetensors" | |
| ) | |
| pipe = StableDiffusionXLPipeline.from_pretrained( | |
| f"{REPO_ID}", | |
| unet=unet, | |
| torch_dtype=torch.bfloat16, | |
| use_safetensors=True, | |
| ).to("cuda") | |
| prompt = "Make Pikachu hold a sign that says 'Nunchaku is awesome', yarn art style, detailed, vibrant colors" | |
| image = pipe(prompt=prompt, guidance_scale=5.0, num_inference_steps=30).images[0] | |
| image.save("sdxl.png") | |
| ``` | |