QuantFunc
๐ค Hugging Face | ๐ค ModelScope | ๐ป GitHub | ๐ฌ WeChat (ๅพฎไฟก) | ๐ฎ Discord
Qwen-Image-Layered-Series
โ ๏ธ Config-only repository โ no model weights. This repo contains only QuantFunc per-layer precision configs for Qwen-Image-Layered (RGBA layer decomposition). It does not contain, mirror, or redistribute any model weights. You bring your own official
Qwen/Qwen-Image-Layered; these configs only tell the QuantFunc engine how to quantize it at load time, on your own machine.
Powered by the QuantFunc ComfyUI plugin โ the fastest diffusion inference engine:
- ๐ 2xโ11x speedup over standard BF16/FP16 Python pipelines.
- โ๏ธ Native C++/CUDA (
libquantfunc.so/quantfunc.dll), zero Python model dependencies. - ๐งฉ Universal format adapter โ loads diffusers / BFL / HF layouts directly, no manual conversion.
- ๐ข Full GPU coverage โ RTX 20/30/40/50 ยท A100/H100/H200/B100/B200 (CUDA 12 & 13); native FP4 on Blackwell.
๐ Install the plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc
What this repository provides
Just the precision configs โ no weights:
Qwen-Image-Layered-Series/
โโโ config.json # = 50x-below INT4 map (HF download-counter query file)
โโโ precision-config/
โโโ 50x-above-fp4-sample.json # NVFP4 (FP4 weights, af8wf4 MLP) โ RTX 50 / SM120+
โโโ 50x-below-int4-sample.json # INT4 per-group-128 โ all SMs (robust fallback)
We deliberately do not host Qwen-Image-Layered weights. The QuantFunc Lighting backend does runtime quantization: you load the official weights and they are quantized in-memory at load, so no pre-quantized checkpoint is ever distributed.
How to use
- Obtain the official model yourself โ
Qwen/Qwen-Image-Layered(diffusers layout). Follow Qwen's distribution channels and license. - Install the QuantFunc ComfyUI plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc
- Load the official model through the Build Pipeline node (universal format adapter).
- Precision config โ leave the node on
auto detect(it recognizes Qwen-Image-Layered and applies the right map automatically: NVFP4 on RTX 50 / SM120+, INT4 otherwise), or point it at a file manually.
Precision configs
Two GPU tiers (the auto-detect picks by SM):
| File | Target GPU | Scheme |
|---|---|---|
50x-above-fp4-sample.json |
RTX 50 / SM120+ | NVFP4 (FP4 e2m1 weights); FP8 activations on the MLP only (af8wf4), attention stays W4A4 |
50x-below-int4-sample.json |
RTX 20/30/40 + datacenter | INT4 per-group-128 (AUTO_4 โ INT4 on all SMs); robust, fully coherent at any SM |
Why the MLP is af8wf4 on the NVFP4 map: use_additional_t_cond + layer3d modulation make the MLP input activations large enough to saturate the FP4-activation per-16 FP8 (e4m3 max 448) microscale โ green-noise background. FP8 activation (per-token FP16 act-scale) on the MLP removes it; attention tolerates FP4 activation and stays on the fast W4A4 path. This differs from the base Qwen-Image NVFP4 map by exactly one layer (the MLP up-projection net.0.proj). In both maps the img_mod/txt_mod modulation GEMMs stay INT8.
โ ๏ธ Companion settings REQUIRED for coherence (not part of the precision map)
- base scheduler (
configs/qwen-image-base-scheduler.json) num_inference_steps = 50true_cfg_scale = 4.0- non-empty
negative_prompt - a real RGBA composite input image
- resolution 640
NVFP4 (
50x-above) is SM120+ only (FP4 is nativesm_120a, never PTX-JIT). On older GPUs use the INT4 map.
Legal / Attribution
- This repository distributes only the QuantFunc precision-config JSON โ our own work, Apache-2.0.
- It contains no Qwen weights and is not affiliated with, nor endorsed by, the Qwen team.
- You are solely responsible for obtaining the official model and complying with its license and terms of use.
Community
- ๐ฎ Discord server
- ๐ฌ Scan the QR code below to join our WeChat group:
- Downloads last month
- 15