---
base_model:
- Qwen/Qwen3.6-35B-A3B
tags:
- qwen
- fp8
- vllm
- compressed-tensors
name: RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic
---

# FP8 Quantized RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic

This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.6-35B-A3B ](https://huggingface.co/Qwen/Qwen3.6-35B-A3B ) model. 
The model has both weights and activations quantized to FP8 with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).

It is compatible and tested against vllm main. Deploy it with: `vllm serve RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic --reasoning-parser qwen3`


# Preliminary Evaluations

1) GSM8K Platinum:
```
lm_eval --model local-chat-completions \
  --tasks gsm8k_platinum_cot_llama \
  --model_args "model=RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
  --num_fewshot 0 \
  --apply_chat_template \
  --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000,presence_penalty=1.5,repetition_penalty=1.0,seed=5678"


```

Recovery:

|          | Qwen/Qwen3.6-35B-A3B | RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic<br> (this model) |
| -------- | :--------------------: | :------------------------------------: |
| Accuracy | 95.86                 | 96.44                                   |
| Recovery | \-                     | 100.60%                                  |


**Note**: More rigorous evaluations are currently in progress and will be available soon.