---
license: other
license_name: flux-non-commercial-license
tags:
- flux
- flux2
- quantized
- int8
- transformer
base_model: black-forest-labs/FLUX.2-klein-9B
base_model_relation: quantized
library_name: diffusers
pipeline_tag: text-to-image
---
# FLUX.2 [klein] 9B (step-distilled) β INT8 (W8A8) Transformer
**Quantized transformer checkpoint** for [FLUX.2 [klein] 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (step-distilled).
INT8 weight and activation quantization via NVIDIA ModelOpt with calibrated input scales.
> **Note**: This repo contains only the quantized **transformer** weights.
> The text encoder, VAE, tokenizer, and scheduler are loaded from the base model:
> [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B).
## Model Details
| Property | Value |
|----------|-------|
| Base Model | [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) |
| Parameters | 9B |
| Quantization Format | INT8 (W8A8) |
| Quantization Type | Weight + Activation (W8A8) |
| Compression | ~2x vs BF16 |
| Weight dtype | `int8` |
| Scale dtype | `float32` |
| Key format | Single-file safetensors |
| Checkpoint | `flux-2-klein-9b-int8.safetensors` |
## Quantization Details
| Property | Value |
|----------|-------|
| Framework | [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer) |
| Calibration Method | NVIDIA ModelOpt `max` (per-channel max abs) |
| Calibration Dataset | 256 samples from 768 diverse prompts (256 T2I, 256 editing, 256 composition) |
| Denoising Steps (calibration) | 4 per sample |
| Weight Quantization | Per-channel symmetric (axis=0) |
| Activation Quantization | Per-tensor via baked `input_scale` / `weight_scale` tensors |
| Preserved Layers | Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16 |
## Evaluation
Evaluated on **48 prompts** (T2I (16 each), editing (16 each), composition (16 each)). Both BF16 baseline and INT8 outputs are generated with identical prompts and seeds, then scored independently.
π Understanding the Metrics
We report two categories of metrics:
**Text-Image Alignment** β measures output quality independently:
- **CLIP Score β**: Uses OpenAI's CLIP model to score how well each generated image matches its text prompt. Both BF16 and quantized models are evaluated independently against the same prompts β this is *not* a comparison between the two outputs, but an independent quality measure for each. Higher is better (typical range: 0.25β0.35).
**Fidelity** β measures how closely the quantized output matches the BF16 baseline:
- **LPIPS β** (Learned Perceptual Image Patch Similarity): Uses a neural network to judge perceptual similarity the way a human would. Unlike pixel-level metrics, LPIPS captures structural and textural differences. 0 = perceptually identical, 1 = completely different. Values below 0.1 indicate very high fidelity.
- **PSNR β** (Peak Signal-to-Noise Ratio): Measures pixel-level accuracy in decibels. Higher values mean less error. 20β30 dB is typical for quantized model comparisons; 30+ dB is excellent.
- **FID β** (FrΓ©chet Inception Distance): Compares the statistical distribution of *all* generated images (not individual pairs). Lower means the quantized model produces images from the same visual distribution as BF16. Sensitive to sample size β our 48-image evaluation provides a directional signal rather than a definitive score.
### Text-Image Alignment (CLIP Score β)
CLIP score measures how well the generated image matches the text prompt (higher = better). Both models are evaluated independently:
| Model | CLIP Score |
|-------|------------|
| BF16 (baseline) | 0.6426 |
| INT8 | 0.6422 |
### Fidelity vs BF16 Baseline
These metrics measure how closely the quantized output matches the BF16 reference:
| Metric | Value | Description |
|--------|-------|-------------|
| **LPIPS** β | 0.0615 | Perceptual distance (0 = identical) |
| **PSNR** β | 22.34 dB | Signal-to-noise ratio |
| **FID** β | 32.27 | Distribution distance |
### Per-Task Breakdown
| Task | CLIP β | LPIPS β | PSNR β |
|------|--------|---------|--------|
| Text-to-Image | 0.6549 | 0.0450 | 22.71 dB |
| Editing | 0.6279 | 0.0763 | 21.95 dB |
| Composition | 0.6440 | 0.0633 | 22.36 dB |
### Comparison with FP8 (Reference)
Black Forest Labs officially provides **FP8** quantized checkpoints for FLUX.2 Klein. However, FP8 (float8_e4m3fn) requires hardware support introduced with NVIDIA Ada Lovelace (RTX 40-series / L4 / L40). **INT8** offers a quantized alternative at the same ~2Γ compression ratio for GPUs that lack native FP8 support (e.g., Ampere, Turing, or non-NVIDIA hardware with INT8 acceleration).
The table below compares both formats against the same BF16 baseline (CLIP 0.6426), evaluated with identical prompts and seeds:
| Metric | INT8 | FP8 |
|--------|---:|---:|
| **CLIP** β | 0.6422 | 0.6419 |
| **LPIPS** β | 0.0615 | 0.0559 |
| **PSNR** β | 22.34 dB | 23.14 dB |
| **FID** β | 32.27 | 28.91 |
#### Per-Task Breakdown
| Task | INT8 CLIP β | FP8 CLIP β | INT8 LPIPS β | FP8 LPIPS β | INT8 PSNR β | FP8 PSNR β |
|------|---:|---:|---:|---:|---:|---:|
| Text-to-Image | 0.6549 | 0.6547 | 0.0450 | 0.0452 | 22.71 dB | 22.85 dB |
| Editing | 0.6279 | 0.6297 | 0.0763 | 0.0598 | 21.95 dB | 23.04 dB |
| Composition | 0.6440 | 0.6415 | 0.0633 | 0.0627 | 22.36 dB | 23.53 dB |
### Visual Comparison (BF16 vs INT8)
All images generated with identical prompts and seeds (4 denoising steps, 1024Γ1024).
#### Text-to-Image
> *"Oil painting of a stormy seascape in the style of J.M.W. Turner, violent waves crashing against rocks, ship barely visible in mist, thick impasto texture"*

#### Image Editing
> **Base**: *"A red sports car parked in a garage"*
> **Edit**: *"Change the car color to yellow and make the garage look like a futuristic space hangar"*

#### Multi-Reference Composition (2 references)
> **Ref 1**: *"A weathered bronze statue of a Greek philosopher"*
> **Ref 2**: *"A lush tropical rainforest canopy"*
> **Compose**: *"The statue is being reclaimed by the jungle, with vines and flowers growing over its features"*

### INT8 Performance Benchmarks
> Measured on **NVIDIA RTX 5090** with PyTorch 2.10.0+cu130 and CUDA 13.0. Full INT8 stack (INT8 transformer + INT8 text encoder). Resolution: 1024Γ1024.
| Model | Steps | Eager | Compiled | Throughput | VRAM |
| klein-4b | 4 | 1.77s | 0.72s | 1.387 img/s | 11.25 GB |
| klein-base-4b | 50 | 33.25s | 9.92s | 0.101 img/s | 11.26 GB |
| klein-9b β | 4 | 3.04s | 1.09s | 0.917 img/s | 20.15 GB |
| klein-base-9b | 50 | 62.49s | 18.70s | 0.053 img/s | 20.16 GB |
> `torch.compile` speedup: **2.5Γ** (klein-4b), **3.4Γ** (klein-base-4b), **2.8Γ** (klein-9b), **3.3Γ** (klein-base-9b)
>
> **Why the large speedup?** Our pipeline loads INT8 weights using [TorchAO](https://github.com/pytorch/ao), which represents linear layers as W8A8 quantized tensors. In eager mode, each quantized matmul dispatches separate CUDA kernels for dequantization and computation. With `torch.compile`, the full graph is traced and these operations are fused into optimized Triton kernels that perform dequantize + matmul in a single pass, eliminating kernel launch overhead and intermediate memory traffic.
## Usage
> **π§ Code release coming soon.** A pip-installable loader library is in preparation.
### Compatibility
This checkpoint uses the **official FLUX.2 single-file safetensors format** β the same key layout and structure used by
Black Forest Labs for their official FP8 and NVFP4 quantized models. Any loader that supports
quantized FLUX.2 single-file checkpoints can load this INT8 checkpoint.
## License
This model inherits the license from the base model: **[FLUX Non-Commercial](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B/blob/main/LICENSE.md)**.
## Acknowledgments
- [Black Forest Labs](https://blackforestlabs.ai/) for FLUX.2
- [NVIDIA](https://nvidia.com/) for ModelOpt quantization tools
- [TorchAO](https://github.com/pytorch/ao) for quantized tensor runtime