--- license: apache-2.0 base_model: - openai/gpt-oss-120b --- # Model Overview - **Model Architecture:** gpt-oss-120b - **Input:** Text - **Output:** Text - **Supported Hardware Microarchitecture:** AMD MI350/MI355 - **ROCm**: 7.0 - **Operating System(s):** Linux - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/) - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) - **Weight quantization:** OCP MXFP4, Static - **Activation quantization:** OCP MXFP4, Dynamic - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup) This model was built with gpt-oss-120b model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization. # Model Quantization The model was quantized from [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights are quantized MXFP4 and activations were quantized to FP8. **Quantization scripts:** ``` cd Quark/examples/torch/language_modeling/llm_ptq/ exclude_layers="*lm_head *self_attn* *router*" python3 internal_scripts/quantize_quark.py \ --model_dir openai/gpt-oss-120b \ --quant_scheme w_mxfp4_a_fp8 \ --exclude_layers $exclude_layers \ --num_calib_data 512 \ --output_dir amd/gpt-oss120b-w-mxfp4-a-fp8 \ --model_export hf_format \ --multi_gpu ``` # Deployment ### Use with vLLM This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. ## Evaluation The model was evaluated on AIME25 and GPQA Diamond benchmarks with `low` reasoning effort. ### Accuracy
| Benchmark | gpt-oss-120b | gpt-oss120b-w-mxfp4-a-fp8(this model) | Recovery |
| AIME25 | 65.25 | 67.12 | 102.87% |
| GPQA | 51.67 | 53.42 | 103.39% |