--- base_model: - Qwen/Qwen3.6-35B-A3B tags: - qwen - fp8 - vllm - compressed-tensors name: RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic --- # FP8 Quantized RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic This is a preliminary version (and subject to change) of FP8 quantized [Qwen/Qwen3.6-35B-A3B ](https://huggingface.co/Qwen/Qwen3.6-35B-A3B ) model. The model has both weights and activations quantized to FP8 with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor). It is compatible and tested against vllm main. Deploy it with: `vllm serve RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic --reasoning-parser qwen3` # Preliminary Evaluations 1) GSM8K Platinum: ``` lm_eval --model local-chat-completions \ --tasks gsm8k_platinum_cot_llama \ --model_args "model=RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \ --num_fewshot 0 \ --apply_chat_template \ --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000,presence_penalty=1.5,repetition_penalty=1.0,seed=5678" ``` Recovery: | | Qwen/Qwen3.6-35B-A3B | RedHatAI/Qwen3.6-35B-A3B-FP8-dynamic
(this model) | | -------- | :--------------------: | :------------------------------------: | | Accuracy | 95.86 | 96.44 | | Recovery | \- | 100.60% | **Note**: More rigorous evaluations are currently in progress and will be available soon.