FLUX.2-klein-base-4b-INT8-transformer-quants

INT8 (W8A8) quantization variants for FLUX.2-klein-base-4B (4B parameters).

This repository contains multiple INT8 quantization variants for experimentation and comparison.

Status: Static/Max variants (int8-per-row, int8-per-tensor) are available now. SmoothQuant variants are pending and will be added when ready.

Variant	Algorithm	Scale Mode	Status	Checkpoint
int8-per-row	static	per-row	✅ Available	`flux-2-klein-base-4b-int8-per-row.safetensors`
int8-per-tensor	static	per-tensor	✅ Available	`flux-2-klein-base-4b-int8-per-tensor.safetensors`
int8-smoothquant-per-row	smoothquant	per-row	🔜 Pending	`flux-2-klein-base-4b-int8-smoothquant-per-row.safetensors`
int8-smoothquant-per-tensor	smoothquant	per-tensor	🔜 Pending	`flux-2-klein-base-4b-int8-smoothquant-per-tensor.safetensors`

Quantization Details

All variants use NVIDIA TensorRT Model Optimizer (ModelOpt) INT8 (W8A8) quantization:

Property	Value
Framework	NVIDIA ModelOpt
Calibration	768 prompts (256 T2I, 256 editing, 256 composition), 50 steps each
Weight Quantization	INT8 symmetric — per-row or per-tensor depending on variant
Activation Quantization	Dynamic per-row (quantized on-the-fly at inference, one scale per token)
Preserved Layers	Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16

Algorithm × Scale Mode

	Per-Row	Per-Tensor
Static (Max)	`int8-per-row` ✅	`int8-per-tensor` ✅
SmoothQuant	`int8-smoothquant-per-row` 🔜	`int8-smoothquant-per-tensor` 🔜

Algorithm:

Static (Max): Standard INT8 quantization with calibrated min/max ranges
SmoothQuant (pending): Migrates quantization difficulty from activations to weights for better accuracy

Scale Mode:

Per-Row: Independent scale per output channel (finer granularity, higher accuracy)
Per-Tensor: Single scale per tensor (faster, lower memory, slightly reduced accuracy)

Note: In all variants, input activations are always quantized dynamically per-row at inference time (one scale per token). The scale mode above refers to the weight quantization granularity.

Evaluation Results

Compared against BF16 baseline using identical prompts, seeds, and resolution.

Overall Metrics

Variant	CLIP ↑	LPIPS ↓	PSNR ↑	MSE ↓	FID ↓
BF16 (baseline)	0.6491	—	—	—	—
FP8 (reference)	0.6487	0.0645	24.21	477.33	34.52
int8-per-row	0.6486	0.0560	26.03	445.43	28.03
int8-per-tensor	0.6495	0.1025	22.02	743.48	45.29
int8-smoothquant-per-row	—	—	—	—	—
int8-smoothquant-per-tensor	—	—	—	—	—

Text-to-Image

Variant	CLIP ↑	LPIPS ↓	PSNR ↑
FP8 (reference)	0.6452	0.0804	22.51
int8-per-row	0.6450	0.0649	24.35
int8-per-tensor	0.6457	0.1364	20.17
int8-smoothquant-per-row	—	—	—
int8-smoothquant-per-tensor	—	—	—

Dramatic chiaroscuro portrait of a cellist mid-performance, single spotlight from above, instrument bow caught in motion blur, concert hall darkness

Stained glass window design depicting the four elements, lead came outlines, rich jewel tones of ruby, sapphire, emerald, and topaz

Editing

Variant	CLIP ↑	LPIPS ↓	PSNR ↑
FP8 (reference)	0.6420	0.0309	28.49
int8-per-row	0.6418	0.0215	31.25
int8-per-tensor	0.6424	0.0588	25.71
int8-smoothquant-per-row	—	—	—
int8-smoothquant-per-tensor	—	—	—

Base: A bicycle leaning against a stone wall in a village

Edit: Transform the village into an underwater coral reef scene, the bicycle covered in barnacles and sea anemones, fish swimming around

Base: A food truck parked on a city street at noon

Edit: Change the street to a Venice canal with the food truck floating on a gondola platform, evening golden hour lighting

Composition

Variant	CLIP ↑	LPIPS ↓	PSNR ↑
FP8 (reference)	0.6591	0.0824	21.63
int8-per-row	0.6591	0.0817	22.48
int8-per-tensor	0.6603	0.1123	20.19
int8-smoothquant-per-row	—	—	—
int8-smoothquant-per-tensor	—	—	—

Create a zen garden where the raked sand patterns flow into and around a giant ramen bowl as the central stone

A clockwork mechanical wolf made of brass gears howling at the full moon on the snowy ridge, steam rising from its joints

Usage

🚧 Code release coming soon. A pip-installable loader library is in preparation.

In the meantime, these checkpoints can be tested with ComfyUI using the ComfyUI-Flux2-INT8 custom node. Per-row quantization support is available via PR #24.

Technical Details

Property	Value
Base Model	FLUX.2-klein-base-4B
Parameters	4B
Quantization	INT8 (W8A8) via NVIDIA ModelOpt
Calibration	768 prompts (256 per task), 50 steps each
Activation Quantization	Dynamic per-row (quantized on-the-fly at inference)
Preserved Layers	Embedder layers and output projection kept in BF16
Inference Steps	50
Guidance Scale	4.0

License

This model inherits the license from the base model: Apache 2.0.

Downloads last month: 10

Model tree for vistralis/FLUX.2-klein-base-4b-INT8-transformer-quants

Base model

black-forest-labs/FLUX.2-klein-base-4B

Quantized

(8)

this model