Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.gitattributes +10 -0
README.md +168 -0
assets/comparison_composition_1.png +3 -0
assets/comparison_composition_2.png +3 -0
assets/comparison_editing_1.png +3 -0
assets/comparison_editing_2.png +3 -0
assets/comparison_t2i_1.png +3 -0
assets/comparison_t2i_2.png +3 -0
assets/reference_composition_1.png +3 -0
assets/reference_composition_2.png +3 -0
assets/reference_editing_1.png +3 -0
assets/reference_editing_2.png +3 -0
flux-2-klein-base-4b-int8-per-row.safetensors +3 -0
flux-2-klein-base-4b-int8-per-tensor.safetensors +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/comparison_composition_1.png filter=lfs diff=lfs merge=lfs -text
+assets/comparison_composition_2.png filter=lfs diff=lfs merge=lfs -text
+assets/comparison_editing_1.png filter=lfs diff=lfs merge=lfs -text
+assets/comparison_editing_2.png filter=lfs diff=lfs merge=lfs -text
+assets/comparison_t2i_1.png filter=lfs diff=lfs merge=lfs -text
+assets/comparison_t2i_2.png filter=lfs diff=lfs merge=lfs -text
+assets/reference_composition_1.png filter=lfs diff=lfs merge=lfs -text
+assets/reference_composition_2.png filter=lfs diff=lfs merge=lfs -text
+assets/reference_editing_1.png filter=lfs diff=lfs merge=lfs -text
+assets/reference_editing_2.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+---
+library_name: diffusers
+license: apache-2.0
+base_model: black-forest-labs/FLUX.2-klein-base-4B
+base_model_relation: quantized
+tags:
+  - flux
+  - flux2
+  - quantized
+  - int8
+  - transformer
+  - nvidia-modelopt
+pipeline_tag: text-to-image
+---
+# FLUX.2-klein-base-4b-INT8-transformer-quants
+INT8 (W8A8) quantization variants for [FLUX.2-klein-base-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-4B) (4B parameters).
+This repository contains multiple INT8 quantization variants for experimentation and comparison.
+> **Status**: Static/Max variants (`int8-per-row`, `int8-per-tensor`) are available now.
+> SmoothQuant variants are pending and will be added when ready.
+| Variant | Algorithm | Scale Mode | Status | Checkpoint |
+|---------|-----------|------------|--------|------------|
+| int8-per-row | static | per-row | ✅ Available | `flux-2-klein-base-4b-int8-per-row.safetensors` |
+| int8-per-tensor | static | per-tensor | ✅ Available | `flux-2-klein-base-4b-int8-per-tensor.safetensors` |
+| int8-smoothquant-per-row | smoothquant | per-row | 🔜 Pending | `flux-2-klein-base-4b-int8-smoothquant-per-row.safetensors` |
+| int8-smoothquant-per-tensor | smoothquant | per-tensor | 🔜 Pending | `flux-2-klein-base-4b-int8-smoothquant-per-tensor.safetensors` |
+## Quantization Details
+All variants use [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer)
+INT8 (W8A8) quantization:
+| Property | Value |
+|----------|-------|
+| Framework | NVIDIA ModelOpt |
+| Calibration | 768 prompts (256 T2I, 256 editing, 256 composition), 50 steps each |
+| Weight Quantization | INT8 symmetric — per-row or per-tensor depending on variant |
+| Activation Quantization | Dynamic per-row (quantized on-the-fly at inference, one scale per token) |
+| Preserved Layers | Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16 |
+### Algorithm × Scale Mode
+| | **Per-Row** | **Per-Tensor** |
+|---|---|---|
+| **Static (Max)** | `int8-per-row` ✅ | `int8-per-tensor` ✅ |
+| **SmoothQuant** | `int8-smoothquant-per-row` 🔜 | `int8-smoothquant-per-tensor` 🔜 |
+**Algorithm:**
+- **Static (Max)**: Standard INT8 quantization with calibrated min/max ranges
+- **SmoothQuant** *(pending)*: Migrates quantization difficulty from activations to weights for better accuracy
+**Scale Mode:**
+- **Per-Row**: Independent scale per output channel (finer granularity, higher accuracy)
+- **Per-Tensor**: Single scale per tensor (faster, lower memory, slightly reduced accuracy)
+> **Note**: In all variants, input activations are always quantized **dynamically per-row** at inference time
+> (one scale per token). The scale mode above refers to the **weight** quantization granularity.
+## Evaluation Results
+Compared against BF16 baseline using identical prompts, seeds, and resolution.
+### Overall Metrics
+| Variant | CLIP ↑ | LPIPS ↓ | PSNR ↑ | MSE ↓ | FID ↓ |
+|---------|--------|---------|--------|-------|-------|
+| BF16 (baseline) | 0.6491 | — | — | — | — |
+| FP8 (reference) | 0.6487 | 0.0645 | 24.21 | 477.33 | 34.52 |
+| int8-per-row | 0.6486 | 0.0560 | 26.03 | 445.43 | 28.03 |
+| int8-per-tensor | 0.6495 | 0.1025 | 22.02 | 743.48 | 45.29 |
+| int8-smoothquant-per-row | — | — | — | — | — |
+| int8-smoothquant-per-tensor | — | — | — | — | — |
+### Text-to-Image
+| Variant | CLIP ↑ | LPIPS ↓ | PSNR ↑ |
+|---------|--------|---------|--------|
+| FP8 (reference) | 0.6452 | 0.0804 | 22.51 |
+| int8-per-row | 0.6450 | 0.0649 | 24.35 |
+| int8-per-tensor | 0.6457 | 0.1364 | 20.17 |
+| int8-smoothquant-per-row | — | — | — |
+| int8-smoothquant-per-tensor | — | — | — |
+> Dramatic chiaroscuro portrait of a cellist mid-performance, single spotlight from above, instrument bow caught in motion blur, concert hall darkness
+![Text-to-Image 1 — BF16 vs FP8 vs INT8](assets/comparison_t2i_1.png)
+> Stained glass window design depicting the four elements, lead came outlines, rich jewel tones of ruby, sapphire, emerald, and topaz
+![Text-to-Image 2 — BF16 vs FP8 vs INT8](assets/comparison_t2i_2.png)
+### Editing
+| Variant | CLIP ↑ | LPIPS ↓ | PSNR ↑ |
+|---------|--------|---------|--------|
+| FP8 (reference) | 0.6420 | 0.0309 | 28.49 |
+| int8-per-row | 0.6418 | 0.0215 | 31.25 |
+| int8-per-tensor | 0.6424 | 0.0588 | 25.71 |
+| int8-smoothquant-per-row | — | — | — |
+| int8-smoothquant-per-tensor | — | — | — |
+> **Base:** A bicycle leaning against a stone wall in a village
+>
+> **Edit:** Transform the village into an underwater coral reef scene, the bicycle covered in barnacles and sea anemones, fish swimming around
+![Editing 1 — reference](assets/reference_editing_1.png)
+![Editing 1 — BF16 vs FP8 vs INT8](assets/comparison_editing_1.png)
+> **Base:** A food truck parked on a city street at noon
+>
+> **Edit:** Change the street to a Venice canal with the food truck floating on a gondola platform, evening golden hour lighting
+![Editing 2 — reference](assets/reference_editing_2.png)
+![Editing 2 — BF16 vs FP8 vs INT8](assets/comparison_editing_2.png)
+### Composition
+| Variant | CLIP ↑ | LPIPS ↓ | PSNR ↑ |
+|---------|--------|---------|--------|
+| FP8 (reference) | 0.6591 | 0.0824 | 21.63 |
+| int8-per-row | 0.6591 | 0.0817 | 22.48 |
+| int8-per-tensor | 0.6603 | 0.1123 | 20.19 |
+| int8-smoothquant-per-row | — | — | — |
+| int8-smoothquant-per-tensor | — | — | — |
+> Create a zen garden where the raked sand patterns flow into and around a giant ramen bowl as the central stone
+![Composition 1 — reference](assets/reference_composition_1.png)
+![Composition 1 — BF16 vs FP8 vs INT8](assets/comparison_composition_1.png)
+> A clockwork mechanical wolf made of brass gears howling at the full moon on the snowy ridge, steam rising from its joints
+![Composition 2 — reference](assets/reference_composition_2.png)
+![Composition 2 — BF16 vs FP8 vs INT8](assets/comparison_composition_2.png)
+## Usage
+> **🚧 Code release coming soon.** A pip-installable loader library is in preparation.
+In the meantime, these checkpoints can be tested with ComfyUI using the
+[ComfyUI-Flux2-INT8](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8) custom node.
+Per-row quantization support is available via
+[PR #24](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8/pull/24).
+## Technical Details
+| Property | Value |
+|----------|-------|
+| Base Model | [FLUX.2-klein-base-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-base-4B) |
+| Parameters | 4B |
+| Quantization | INT8 (W8A8) via NVIDIA ModelOpt |
+| Calibration | 768 prompts (256 per task), 50 steps each |
+| Activation Quantization | Dynamic per-row (quantized on-the-fly at inference) |
+| Preserved Layers | Embedder layers and output projection kept in BF16 |
+| Inference Steps | 50 |
+| Guidance Scale | 4.0 |
+## License
+This model inherits the license from the base model: **[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)**.