--- library_name: mlx license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE language: - en pipeline_tag: text-generation base_model: Qwen/Qwen2.5-7B-Instruct tags: - chat - mlx - quantization - bias-evaluation - q3 --- # qwen2.5-7b-instruct-q3 (MLX, CBA artifact) MLX-format 3-bit (Q3) variant of [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). This is one of the **15 model artifacts** from the paper: > **Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels** > Plawan Kumar Rath, Rahul Maliakkal. *IEEE Cloud Summit 2026*. > Code: > arXiv: ## Quantization Weight-only post-training quantization via `mlx_lm.convert`: - **bits:** 3 - **group_size:** 64 - **mode:** affine ## How this artifact was produced ```bash python -m mlx_lm.convert \ --hf-path Qwen/Qwen2.5-7B-Instruct \ --mlx-path ./qwen2.5-7b-instruct-q3 \ --quantize \ --q-bits 3 \ --q-group-size 64 ``` This is the **exact** artifact used to produce the inference results in §4.3 of the paper (911,100 records over BBQ ambiguous, 5 seeds × 12,148 items × 15 configs). ## Usage (MLX) ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("plawanrath/qwen2.5-7b-instruct-q3-mlx-cba") prompt = tokenizer.apply_chat_template( [{"role": "user", "content": "Hello!"}], add_generation_prompt=True, tokenize=False, ) print(generate(model, tokenizer, prompt=prompt, max_tokens=128)) ``` Or via CLI: ```bash mlx_lm.generate --model plawanrath/qwen2.5-7b-instruct-q3-mlx-cba --prompt "Hello!" ``` ## Paper findings relevant to this variant The paper documents a **dose-response** relationship between quantization aggressiveness and emergent stereotypical behavior on BBQ ambiguous questions: | Variant | % of BF16-unbiased items that became biased | |---|---| | Q8 | 0.1–0.9% | | Q6 | 0.3–1.3% | | Q4 | 2.2–5.6% | | Q3 | 6.0–21.1% | These changes are largely **invisible to perplexity** (<0.5% shift at Q8, <3% at Q4 across all three families). Treat any deployment of compressed instruction-tuned models on fairness-sensitive tasks accordingly. ## Model details - **Base model:** [`Qwen/Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) - **Family:** Qwen2 - **Parameters:** 7.6B - **Precision:** 3-bit (Q3) - **Format:** MLX (Apple Silicon) - **Conversion framework:** [`mlx-lm`](https://github.com/ml-explore/mlx-lm) ## License Inherited from the base model (`apache-2.0`). See the upstream model page for the full license text. ## Citation ```bibtex @inproceedings{rath2026quantization, title = { Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels }, author = {Rath, Plawan Kumar and Maliakkal, Rahul}, booktitle = { IEEE Cloud Summit 2026 }, year = {2026}, eprint = {2605.15208}, archivePrefix = {arXiv}, url = {https://arxiv.org/abs/2605.15208} } ```