Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +129 -0
flux-2-klein-9b-int8.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+license: apache-2.0
+tags:
+  - flux
+  - flux2
+  - quantized
+  - int8
+  - transformer
+base_model: black-forest-labs/FLUX.2-klein-9B
+base_model_relation: quantized
+library_name: diffusers
+pipeline_tag: text-to-image
+---
+# FLUX.2 [klein] 9B (step-distilled) — INT8 (W8A8) Transformer
+**Quantized transformer checkpoint** for [FLUX.2 [klein] 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (step-distilled).
+INT8 weight and activation quantization via NVIDIA ModelOpt with calibrated input scales.
+> **Note**: This repo contains only the quantized **transformer** weights.
+> The text encoder, VAE, tokenizer, and scheduler are loaded from the base model:
+> [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B).
+## Model Details
+| Property | Value |
+|----------|-------|
+| Base Model | [black-forest-labs/FLUX.2-klein-9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) |
+| Parameters | 9B |
+| Quantization Format | INT8 (W8A8) |
+| Quantization Type | Weight + Activation (W8A8) |
+| Compression | ~2x vs BF16 |
+| Weight dtype | `int8` |
+| Scale dtype | `float32` |
+| Key format | Single-file safetensors |
+| Checkpoint | `flux-2-klein-9b-int8.safetensors` |
+## Quantization Details
+| Property | Value |
+|----------|-------|
+| Framework | [NVIDIA TensorRT Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer) |
+| Calibration Method | NVIDIA ModelOpt `max` (per-channel max abs) |
+| Calibration Dataset | 256 samples from 768 diverse prompts (256 T2I, 256 editing, 256 composition) |
+| Denoising Steps (calibration) | 4 per sample |
+| Weight Quantization | Per-channel symmetric (axis=0) |
+| Activation Quantization | Per-tensor via baked `input_scale` / `weight_scale` tensors |
+| Preserved Layers | Embedder layers (time_embed, context_embedder, x_embedder) and output projection kept in BF16 |
+## Evaluation
+Evaluated on **48 prompts** (T2I (16 each), editing (16 each), composition (16 each)). Both BF16 baseline and INT8 (W8A8) outputs are generated with identical prompts and seeds, then scored independently.
+<details>
+<summary><strong>📊 Understanding the Metrics</strong></summary>
+We report two categories of metrics:
+**Text-Image Alignment** — measures output quality independently:
+- **CLIP Score ↑**: Uses OpenAI's CLIP model to score how well each generated image matches its text prompt. Both BF16 and quantized models are evaluated independently against the same prompts — this is *not* a comparison between the two outputs, but an independent quality measure for each. Higher is better (typical range: 0.25–0.35).
+**Fidelity** — measures how closely the quantized output matches the BF16 baseline:
+- **LPIPS ↓** (Learned Perceptual Image Patch Similarity): Uses a neural network to judge perceptual similarity the way a human would. Unlike pixel-level metrics, LPIPS captures structural and textural differences. 0 = perceptually identical, 1 = completely different. Values below 0.1 indicate very high fidelity.
+- **PSNR ↑** (Peak Signal-to-Noise Ratio): Measures pixel-level accuracy in decibels. Higher values mean less error. 20–30 dB is typical for quantized model comparisons; 30+ dB is excellent.
+- **FID ↓** (Fréchet Inception Distance): Compares the statistical distribution of *all* generated images (not individual pairs). Lower means the quantized model produces images from the same visual distribution as BF16. Sensitive to sample size — our 48-image evaluation provides a directional signal rather than a definitive score.
+</details>
+### Text-Image Alignment (CLIP Score ↑)
+CLIP score measures how well the generated image matches the text prompt (higher = better). Both models are evaluated independently:
+| Model | CLIP Score |
+|-------|------------|
+| BF16 (baseline) | 0.6426 |
+| INT8 (W8A8) | 0.6422 |
+### Fidelity vs BF16 Baseline
+These metrics measure how closely the quantized output matches the BF16 reference:
+| Metric | Value | Description |
+|--------|-------|-------------|
+| **LPIPS** ↓ | 0.0615 | Perceptual distance (0 = identical) |
+| **PSNR** ↑ | 22.34 dB | Signal-to-noise ratio |
+| **FID** ↓ | 32.27 | Distribution distance |
+### Per-Task Breakdown
+| Task | CLIP ↑ | LPIPS ↓ | PSNR ↑ |
+|------|--------|---------|--------|
+| Text-to-Image | 0.6549 | 0.0450 | 22.71 dB |
+| Editing | 0.6279 | 0.0763 | 21.95 dB |
+| Composition | 0.6440 | 0.0633 | 22.36 dB |
+### Comparison with FP8 (E4M3) (Reference)
+Black Forest Labs officially provides **FP8 (E4M3)** quantized checkpoints for FLUX.2 Klein. However, FP8 (float8_e4m3fn) requires hardware support introduced with NVIDIA Ada Lovelace (RTX 40-series / L4 / L40). **INT8 (W8A8)** offers a quantized alternative at the same ~2× compression ratio for GPUs that lack native FP8 support (e.g., Ampere, Turing, or non-NVIDIA hardware with INT8 acceleration).
+The table below compares both formats against the same BF16 baseline (CLIP 0.6426), evaluated with identical prompts and seeds:
+| Metric | INT8 (W8A8) | FP8 (E4M3) |
+|--------|---:|---:|
+| **CLIP** ↑ | 0.6422 | 0.6419 |
+| **LPIPS** ↓ | 0.0615 | 0.0559 |
+| **PSNR** ↑ | 22.34 dB | 23.14 dB |
+| **FID** ↓ | 32.27 | 28.91 |
+## Usage
+> **🚧 Code release coming soon.** A pip-installable loader library is in preparation.
+### Compatibility
+This checkpoint uses the **official FLUX.2 single-file safetensors format** — the same key layout and structure used by
+Black Forest Labs for their official FP8 and NVFP4 quantized models. Any loader that supports
+quantized FLUX.2 single-file checkpoints can load this INT8 checkpoint.
+## License
+This model inherits the license from the base model: **FLUX Non-Commercial**.
+## Acknowledgments
+- [Black Forest Labs](https://blackforestlabs.ai/) for FLUX.2
+- [NVIDIA](https://nvidia.com/) for ModelOpt quantization tools

flux-2-klein-9b-int8.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fc4f91ad7ae1e6d23c591461173328587f02144fb72ed0798ba5dd7eb53d1c08
+size 9439871616