--- base_model: Qwen/Qwen3-8B library_name: transformers tags: - quantized - int8 - compressed-tensors - llm-compressor - flux2 - text-encoder license: apache-2.0 --- # Qwen3-8B-INT8 INT8 (W8A8) quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B), created using [llm-compressor](https://github.com/vllm-project/llm-compressor) with calibrated quantization. ## Overview | Property | Value | |:---|:---| | **Base Model** | Qwen/Qwen3-8B | | **Parameters** | 8.19B | | **Quantization** | INT8 (W8A8) | | **Format** | `compressed-tensors` | | **Tool** | llm-compressor | | **Disk Size** | ~9.4 GB (2 shards) | ## Intended Use Quantized text encoder for [Flux 2 Klein 9B](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) image generation pipelines. Architecturally identical to the Klein 9B text encoder. ## Quantization Details - **Scheme**: W8A8 — 8-bit integer weights and activations - **Targets**: All `Linear` layers (excluding `lm_head`) - **Calibration**: 256 samples from C4, sequential pipeline with CPU offloading ## Hardware Requirements - **Minimum**: Any CUDA GPU with INT8 tensor core support - **Fallback**: Dequantizes to BF16 on unsupported hardware