QuantFunc

🤗 Hugging Face | 🤖 ModelScope | 💻 GitHub | 💬 WeChat (微信) | 🎮 Discord

Qwen-Image-Layered-Series

⚠️ Config-only repository — no model weights. This repo contains only QuantFunc per-layer precision configs for Qwen-Image-Layered (RGBA layer decomposition). It does not contain, mirror, or redistribute any model weights. You bring your own official Qwen/Qwen-Image-Layered; these configs only tell the QuantFunc engine how to quantize it at load time, on your own machine.

Powered by the QuantFunc ComfyUI plugin — the fastest diffusion inference engine:

🚀 2x–11x speedup over standard BF16/FP16 Python pipelines.
⚙️ Native C++/CUDA (libquantfunc.so / quantfunc.dll), zero Python model dependencies.
🧩 Universal format adapter — loads diffusers / BFL / HF layouts directly, no manual conversion.
🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200 (CUDA 12 & 13); native FP4 on Blackwell.

👉 Install the plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc

What this repository provides

Just the precision configs — no weights:

Qwen-Image-Layered-Series/
├── config.json                        # = 50x-below INT4 map (HF download-counter query file)
└── precision-config/
    ├── 50x-above-fp4-sample.json      # NVFP4 (FP4 weights, af8wf4 MLP) — RTX 50 / SM120+
    └── 50x-below-int4-sample.json     # INT4 per-group-128 — all SMs (robust fallback)

We deliberately do not host Qwen-Image-Layered weights. The QuantFunc Lighting backend does runtime quantization: you load the official weights and they are quantized in-memory at load, so no pre-quantized checkpoint is ever distributed.

How to use

Obtain the official model yourself — Qwen/Qwen-Image-Layered (diffusers layout). Follow Qwen's distribution channels and license.
Install the QuantFunc ComfyUI plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc
Load the official model through the Build Pipeline node (universal format adapter).
Precision config — leave the node on auto detect (it recognizes Qwen-Image-Layered and applies the right map automatically: NVFP4 on RTX 50 / SM120+, INT4 otherwise), or point it at a file manually.

Precision configs

Two GPU tiers (the auto-detect picks by SM):

File	Target GPU	Scheme
`50x-above-fp4-sample.json`	RTX 50 / SM120+	NVFP4 (FP4 e2m1 weights); FP8 activations on the MLP only (`af8wf4`), attention stays W4A4
`50x-below-int4-sample.json`	RTX 20/30/40 + datacenter	INT4 per-group-128 (AUTO_4 → INT4 on all SMs); robust, fully coherent at any SM

Why the MLP is af8wf4 on the NVFP4 map: use_additional_t_cond + layer3d modulation make the MLP input activations large enough to saturate the FP4-activation per-16 FP8 (e4m3 max 448) microscale → green-noise background. FP8 activation (per-token FP16 act-scale) on the MLP removes it; attention tolerates FP4 activation and stays on the fast W4A4 path. This differs from the base Qwen-Image NVFP4 map by exactly one layer (the MLP up-projection net.0.proj). In both maps the img_mod/txt_mod modulation GEMMs stay INT8.

⚠️ Companion settings REQUIRED for coherence (not part of the precision map)

base scheduler (configs/qwen-image-base-scheduler.json)
num_inference_steps = 50
true_cfg_scale = 4.0
non-empty negative_prompt
a real RGBA composite input image
resolution 640

NVFP4 (50x-above) is SM120+ only (FP4 is native sm_120a, never PTX-JIT). On older GPUs use the INT4 map.

Legal / Attribution

This repository distributes only the QuantFunc precision-config JSON — our own work, Apache-2.0.
It contains no Qwen weights and is not affiliated with, nor endorsed by, the Qwen team.
You are solely responsible for obtaining the official model and complying with its license and terms of use.

Community

🎮 Discord server
💬 Scan the QR code below to join our WeChat group:

Downloads last month: 15

Model tree for QuantFunc/Qwen-Image-Layered-Series

Base model

Qwen/Qwen-Image

Finetuned

Qwen/Qwen-Image-Layered

Quantized

(6)

this model