---
base_model: Qwen/Qwen3-8B
library_name: transformers
tags:
  - quantized
  - int8
  - compressed-tensors
  - llm-compressor
  - flux2
  - text-encoder
license: apache-2.0
---

# Qwen3-8B-INT8

INT8 (W8A8) quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B), created using [llm-compressor](https://github.com/vllm-project/llm-compressor) with calibrated quantization.

## Overview

| Property | Value |
|:---|:---|
| **Base Model** | Qwen/Qwen3-8B |
| **Parameters** | 8.19B |
| **Quantization** | INT8 (W8A8) |
| **Format** | `compressed-tensors` |
| **Tool** | llm-compressor |
| **Disk Size** | ~9.4 GB (2 shards) |

## Intended Use

Quantized text encoder for [Flux 2 Klein 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) image generation pipelines. Architecturally identical to the Klein 9B text encoder.

## Quantization Details

- **Scheme**: W8A8 — 8-bit integer weights and activations
- **Targets**: All `Linear` layers (excluding `lm_head`)
- **Calibration**: 256 samples from C4, sequential pipeline with CPU offloading

## Hardware Requirements

- **Minimum**: Any CUDA GPU with INT8 tensor core support
- **Fallback**: Dequantizes to BF16 on unsupported hardware