--- language: - en library_name: vllm pipeline_tag: text-generation tags: - text-generation - conversational - compressed-tensors - PTQ - w8a16 - quantized base_model_relation: quantized quantized_by: TheHouseOfTheDude license: other base_model: - ConicCat/Qwen3.5-27B-Writer --- # Qwen3.5-27B-Writer_PTQ (W8A16, Post-Training Quantization) ## Overview This repository provides a **W8A16 PTQ (Post-Training Quantized)** version of **Qwen3.5-27B-Writer**. Unlike AWQ/GPTQ workflows, this model was quantized using a **true PTQ pipeline with no calibration dataset**. The quantization is applied in a **one-shot pass**, making it extremely fast and simple while still maintaining strong fidelity. --- ## Key Highlights - **Quantization Type:** PTQ (Post-Training Quantization) - **Scheme:** W8A16 - Weights: INT8 (per-channel symmetric) - Activations: FP16/BF16 (unchanged) - **Calibration Dataset:** ❌ None (not required) - **Method:** `llmcompressor.oneshot` pipeline - **Target Layers:** Linear layers only - **Ignored Layers:** - `lm_head` - `visual` modules - `linear_attn` - `mtp` --- ## Quantization Details This quant was created using a **QuantizationModifier** recipe: - **Targets:** Linear layers - **Scheme:** W8A16 - **Approach:** One-shot PTQ (no iterative calibration) - **Preserves:** Model structure, tokenizer, and chat template --- ## PTQ Quality Metrics - **Mean KLD:** 0.001895 - **Total Positions:** 204,700 - **Time Elapsed:** 1176.74 seconds - **Throughput:** 173.96 positions/sec --- ## Example Usage (vLLM) ```bash pip install -U vllm vllm serve TheHouseOfTheDude/Qwen3.5-27B-Writer_PTQ \ --quantization compressed-tensors \ --tensor-parallel-size 2 \ --dtype bfloat16 ``` --- ## Notes - No calibration dataset required - Extremely fast quantization pipeline - Designed for vLLM runtime --- ## Credits - Base Model: Qwen3.5-27B-Writer - Quantization: TheHouseOfTheDude