--- language: - en license: apache-2.0 base_model: Qwen/Qwen3.5-9B tags: - qwen - gptq - quantized - math - causal-lm library_name: transformers pipeline_tag: text-generation --- # Qwen3.5-9B-GPTQ-INT8 This model is a GPTQ-quantized version of `Qwen/Qwen3.5-9B` with a normalized text-only `config.json`. ## Quantization - Method: GPTQ - Bits: 8 - Group size: 128 - desc_act: False - damp_percent: 0.1 - Calibration preset: math_qa_cot - Calibration dataset: `zwhe99/DeepMath-103K` split `train` - Max calibration samples: 128 - Max sequence length: 16384 ## Reproduction ```bash uv run python quantization/quantize_qwen35_9b_gptq.py \ --model-name Qwen/Qwen3.5-9B \ --output-dir /workspace/lowbit-math-reasoning/experiments/models/Qwen3.5-9B-GPTQ-INT8 \ --dataset-name zwhe99/DeepMath-103K \ --dataset-config '' \ --dataset-split train \ --calibration-preset math_qa_cot \ --question-column question \ --answer-column r1_solution_1 \ --text-column r1_solution_1 \ --max-calibration-samples 128 \ --max-seq-len 16384 \ --bits 8 \ --group-size 128 \ --damp-percent 0.1 ``` The current quantization script rewrites `config.json` after `save_pretrained()` so the exported checkpoint uses the same text-only `qwen3_5_text` layout as the working INT4 checkpoint. ## Validation This normalized-config checkpoint was re-evaluated on GSM8K and matched the original INT8 accuracy while improving throughput substantially. - Original INT8: EM 0.96, 105.98 tok/s - Fixed-config INT8: EM 0.96, 150.84 tok/s ## Notes - This repository contains quantized weights only. - The checkpoint is intended for text-only evaluation. - `vLLM` loads this checkpoint as `gptq_marlin`.