Instructions to use vistralis/FLUX.2-klein-base-4b-INT8-transformer-quants with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use vistralis/FLUX.2-klein-base-4b-INT8-transformer-quants with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("vistralis/FLUX.2-klein-base-4b-INT8-transformer-quants", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
FLUX.2-klein-base-4b-INT8-transformer-quants
INT8 (W8A8) quantization variants for FLUX.2-klein-base-4B (4B parameters).
This repository contains multiple INT8 quantization variants for experimentation and comparison.
Status: Static/Max variants (
int8-per-row,int8-per-tensor) are available now. SmoothQuant variants are pending and will be added when ready.
| Variant | Algorithm | Scale Mode | Status | Checkpoint |
|---|---|---|---|---|
| int8-per-row | static | per-row | β Available | flux-2-klein-base-4b-int8-per-row.safetensors |
| int8-per-tensor | static | per-tensor | β Available | flux-2-klein-base-4b-int8-per-tensor.safetensors |
| int8-smoothquant-per-row | smoothquant | per-row | π Pending | flux-2-klein-base-4b-int8-smoothquant-per-row.safetensors |
| int8-smoothquant-per-tensor | smoothquant | per-tensor | π Pending | flux-2-klein-base-4b-int8-smoothquant-per-tensor.safetensors |
Quantization Details
All variants use NVIDIA TensorRT Model Optimizer (ModelOpt) INT8 (W8A8) quantization:
| Property | Value |
|---|---|
| Framework | NVIDIA ModelOpt |
| Calibration | 768 prompts (256 T2I, 256 editing, 256 composition), 50 steps each |
| Weight Quantization | INT8 symmetric β per-row or per-tensor depending on variant |
| Activation Quantization | Dynamic per-row (quantized on-the-fly at inference, one scale per token) |
| Preserved Layers | Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16 |
Algorithm Γ Scale Mode
| Per-Row | Per-Tensor | |
|---|---|---|
| Static (Max) | int8-per-row β
|
int8-per-tensor β
|
| SmoothQuant | int8-smoothquant-per-row π |
int8-smoothquant-per-tensor π |
Algorithm:
- Static (Max): Standard INT8 quantization with calibrated min/max ranges
- SmoothQuant (pending): Migrates quantization difficulty from activations to weights for better accuracy
Scale Mode:
- Per-Row: Independent scale per output channel (finer granularity, higher accuracy)
- Per-Tensor: Single scale per tensor (faster, lower memory, slightly reduced accuracy)
Note: In all variants, input activations are always quantized dynamically per-row at inference time (one scale per token). The scale mode above refers to the weight quantization granularity.
Evaluation Results
Compared against BF16 baseline using identical prompts, seeds, and resolution.
Overall Metrics
| Variant | CLIP β | LPIPS β | PSNR β | MSE β | FID β |
|---|---|---|---|---|---|
| BF16 (baseline) | 0.6491 | β | β | β | β |
| FP8 (reference) | 0.6487 | 0.0645 | 24.21 | 477.33 | 34.52 |
| int8-per-row | 0.6486 | 0.0560 | 26.03 | 445.43 | 28.03 |
| int8-per-tensor | 0.6495 | 0.1025 | 22.02 | 743.48 | 45.29 |
| int8-smoothquant-per-row | β | β | β | β | β |
| int8-smoothquant-per-tensor | β | β | β | β | β |
Text-to-Image
| Variant | CLIP β | LPIPS β | PSNR β |
|---|---|---|---|
| FP8 (reference) | 0.6452 | 0.0804 | 22.51 |
| int8-per-row | 0.6450 | 0.0649 | 24.35 |
| int8-per-tensor | 0.6457 | 0.1364 | 20.17 |
| int8-smoothquant-per-row | β | β | β |
| int8-smoothquant-per-tensor | β | β | β |
Dramatic chiaroscuro portrait of a cellist mid-performance, single spotlight from above, instrument bow caught in motion blur, concert hall darkness
Stained glass window design depicting the four elements, lead came outlines, rich jewel tones of ruby, sapphire, emerald, and topaz
Editing
| Variant | CLIP β | LPIPS β | PSNR β |
|---|---|---|---|
| FP8 (reference) | 0.6420 | 0.0309 | 28.49 |
| int8-per-row | 0.6418 | 0.0215 | 31.25 |
| int8-per-tensor | 0.6424 | 0.0588 | 25.71 |
| int8-smoothquant-per-row | β | β | β |
| int8-smoothquant-per-tensor | β | β | β |
Base: A bicycle leaning against a stone wall in a village
Edit: Transform the village into an underwater coral reef scene, the bicycle covered in barnacles and sea anemones, fish swimming around
Base: A food truck parked on a city street at noon
Edit: Change the street to a Venice canal with the food truck floating on a gondola platform, evening golden hour lighting
Composition
| Variant | CLIP β | LPIPS β | PSNR β |
|---|---|---|---|
| FP8 (reference) | 0.6591 | 0.0824 | 21.63 |
| int8-per-row | 0.6591 | 0.0817 | 22.48 |
| int8-per-tensor | 0.6603 | 0.1123 | 20.19 |
| int8-smoothquant-per-row | β | β | β |
| int8-smoothquant-per-tensor | β | β | β |
Create a zen garden where the raked sand patterns flow into and around a giant ramen bowl as the central stone
A clockwork mechanical wolf made of brass gears howling at the full moon on the snowy ridge, steam rising from its joints
Usage
π§ Code release coming soon. A pip-installable loader library is in preparation.
In the meantime, these checkpoints can be tested with ComfyUI using the ComfyUI-Flux2-INT8 custom node. Per-row quantization support is available via PR #24.
Technical Details
| Property | Value |
|---|---|
| Base Model | FLUX.2-klein-base-4B |
| Parameters | 4B |
| Quantization | INT8 (W8A8) via NVIDIA ModelOpt |
| Calibration | 768 prompts (256 per task), 50 steps each |
| Activation Quantization | Dynamic per-row (quantized on-the-fly at inference) |
| Preserved Layers | Embedder layers and output projection kept in BF16 |
| Inference Steps | 50 |
| Guidance Scale | 4.0 |
License
This model inherits the license from the base model: Apache 2.0.
- Downloads last month
- 10
Model tree for vistralis/FLUX.2-klein-base-4b-INT8-transformer-quants
Base model
black-forest-labs/FLUX.2-klein-base-4B








