Vistralis Labs commited on
Commit
fdb108e
Β·
verified Β·
1 Parent(s): 52b8975

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +129 -0
  2. flux-2-klein-9b-int8.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - flux
5
+ - flux2
6
+ - quantized
7
+ - int8
8
+ - transformer
9
+ base_model: black-forest-labs/FLUX.2-klein-9B
10
+ base_model_relation: quantized
11
+ library_name: diffusers
12
+ pipeline_tag: text-to-image
13
+ ---
14
+
15
+ # FLUX.2 [klein] 9B (step-distilled) β€” INT8 (W8A8) Transformer
16
+
17
+ **Quantized transformer checkpoint** for [FLUX.2 [klein] 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (step-distilled).
18
+
19
+ INT8 weight and activation quantization via NVIDIA ModelOpt with calibrated input scales.
20
+
21
+ > **Note**: This repo contains only the quantized **transformer** weights.
22
+ > The text encoder, VAE, tokenizer, and scheduler are loaded from the base model:
23
+ > [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B).
24
+
25
+ ## Model Details
26
+
27
+ | Property | Value |
28
+ |----------|-------|
29
+ | Base Model | [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) |
30
+ | Parameters | 9B |
31
+ | Quantization Format | INT8 (W8A8) |
32
+ | Quantization Type | Weight + Activation (W8A8) |
33
+ | Compression | ~2x vs BF16 |
34
+ | Weight dtype | `int8` |
35
+ | Scale dtype | `float32` |
36
+ | Key format | Single-file safetensors |
37
+ | Checkpoint | `flux-2-klein-9b-int8.safetensors` |
38
+
39
+ ## Quantization Details
40
+
41
+ | Property | Value |
42
+ |----------|-------|
43
+ | Framework | [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer) |
44
+ | Calibration Method | NVIDIA ModelOpt `max` (per-channel max abs) |
45
+ | Calibration Dataset | 256 samples from 768 diverse prompts (256 T2I, 256 editing, 256 composition) |
46
+ | Denoising Steps (calibration) | 4 per sample |
47
+ | Weight Quantization | Per-channel symmetric (axis=0) |
48
+ | Activation Quantization | Per-tensor via baked `input_scale` / `weight_scale` tensors |
49
+ | Preserved Layers | Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16 |
50
+
51
+ ## Evaluation
52
+
53
+ Evaluated on **48 prompts** (T2I (16 each), editing (16 each), composition (16 each)). Both BF16 baseline and INT8 (W8A8) outputs are generated with identical prompts and seeds, then scored independently.
54
+
55
+ <details>
56
+ <summary><strong>πŸ“Š Understanding the Metrics</strong></summary>
57
+
58
+ We report two categories of metrics:
59
+
60
+ **Text-Image Alignment** β€” measures output quality independently:
61
+ - **CLIP Score ↑**: Uses OpenAI's CLIP model to score how well each generated image matches its text prompt. Both BF16 and quantized models are evaluated independently against the same prompts β€” this is *not* a comparison between the two outputs, but an independent quality measure for each. Higher is better (typical range: 0.25–0.35).
62
+
63
+ **Fidelity** β€” measures how closely the quantized output matches the BF16 baseline:
64
+ - **LPIPS ↓** (Learned Perceptual Image Patch Similarity): Uses a neural network to judge perceptual similarity the way a human would. Unlike pixel-level metrics, LPIPS captures structural and textural differences. 0 = perceptually identical, 1 = completely different. Values below 0.1 indicate very high fidelity.
65
+ - **PSNR ↑** (Peak Signal-to-Noise Ratio): Measures pixel-level accuracy in decibels. Higher values mean less error. 20–30 dB is typical for quantized model comparisons; 30+ dB is excellent.
66
+ - **FID ↓** (FrΓ©chet Inception Distance): Compares the statistical distribution of *all* generated images (not individual pairs). Lower means the quantized model produces images from the same visual distribution as BF16. Sensitive to sample size β€” our 48-image evaluation provides a directional signal rather than a definitive score.
67
+
68
+ </details>
69
+
70
+ ### Text-Image Alignment (CLIP Score ↑)
71
+
72
+ CLIP score measures how well the generated image matches the text prompt (higher = better). Both models are evaluated independently:
73
+
74
+ | Model | CLIP Score |
75
+ |-------|------------|
76
+ | BF16 (baseline) | 0.6426 |
77
+ | INT8 (W8A8) | 0.6422 |
78
+
79
+ ### Fidelity vs BF16 Baseline
80
+
81
+ These metrics measure how closely the quantized output matches the BF16 reference:
82
+
83
+ | Metric | Value | Description |
84
+ |--------|-------|-------------|
85
+ | **LPIPS** ↓ | 0.0615 | Perceptual distance (0 = identical) |
86
+ | **PSNR** ↑ | 22.34 dB | Signal-to-noise ratio |
87
+ | **FID** ↓ | 32.27 | Distribution distance |
88
+
89
+ ### Per-Task Breakdown
90
+
91
+ | Task | CLIP ↑ | LPIPS ↓ | PSNR ↑ |
92
+ |------|--------|---------|--------|
93
+ | Text-to-Image | 0.6549 | 0.0450 | 22.71 dB |
94
+ | Editing | 0.6279 | 0.0763 | 21.95 dB |
95
+ | Composition | 0.6440 | 0.0633 | 22.36 dB |
96
+
97
+
98
+ ### Comparison with FP8 (E4M3) (Reference)
99
+
100
+ Black Forest Labs officially provides **FP8 (E4M3)** quantized checkpoints for FLUX.2 Klein. However, FP8 (float8_e4m3fn) requires hardware support introduced with NVIDIA Ada Lovelace (RTX 40-series / L4 / L40). **INT8 (W8A8)** offers a quantized alternative at the same ~2Γ— compression ratio for GPUs that lack native FP8 support (e.g., Ampere, Turing, or non-NVIDIA hardware with INT8 acceleration).
101
+
102
+ The table below compares both formats against the same BF16 baseline (CLIP 0.6426), evaluated with identical prompts and seeds:
103
+
104
+ | Metric | INT8 (W8A8) | FP8 (E4M3) |
105
+ |--------|---:|---:|
106
+ | **CLIP** ↑ | 0.6422 | 0.6419 |
107
+ | **LPIPS** ↓ | 0.0615 | 0.0559 |
108
+ | **PSNR** ↑ | 22.34 dB | 23.14 dB |
109
+ | **FID** ↓ | 32.27 | 28.91 |
110
+
111
+
112
+ ## Usage
113
+
114
+ > **🚧 Code release coming soon.** A pip-installable loader library is in preparation.
115
+
116
+ ### Compatibility
117
+
118
+ This checkpoint uses the **official FLUX.2 single-file safetensors format** β€” the same key layout and structure used by
119
+ Black Forest Labs for their official FP8 and NVFP4 quantized models. Any loader that supports
120
+ quantized FLUX.2 single-file checkpoints can load this INT8 checkpoint.
121
+
122
+ ## License
123
+
124
+ This model inherits the license from the base model: **FLUX Non-Commercial**.
125
+
126
+ ## Acknowledgments
127
+
128
+ - [Black Forest Labs](https://blackforestlabs.ai/) for FLUX.2
129
+ - [NVIDIA](https://nvidia.com/) for ModelOpt quantization tools
flux-2-klein-9b-int8.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc4f91ad7ae1e6d23c591461173328587f02144fb72ed0798ba5dd7eb53d1c08
3
+ size 9439871616