kylesayrs commited on
Commit
116a6a9
·
verified ·
1 Parent(s): 94caf23

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -24,6 +24,9 @@ vllm serve RedHatAI/DeepSeek-V4-Flash-NVFP4-FP8 --tensor-parallel-size 4 --port
24
  ```
25
 
26
  ## Evaluation
 
 
 
27
  ```bash
28
  python tests/evals/gsm8k/gsm8k_eval.py
29
  ```
@@ -38,4 +41,19 @@ Total output tokens: 116217
38
  Output tokens per second: 671.752
39
  ```
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  For more details on how this model was created and run in LLM Compressor, please contact Kyle Sayers on the vLLM Slack: https://communityinviter.com/apps/vllm-dev/join-vllm-developers-slack
 
24
  ```
25
 
26
  ## Evaluation
27
+ This model has a noticably lower accuracy recovery than the base model due to the base model being released in a quantized format and differences between mxfp4 and nvfp4.
28
+ More advanced techniques such as GPTQ can be used to increase accuracy recovery beyond this model's current state.
29
+
30
  ```bash
31
  python tests/evals/gsm8k/gsm8k_eval.py
32
  ```
 
41
  Output tokens per second: 671.752
42
  ```
43
 
44
+ ```bash
45
+ python3 tests/evals/mmlu_pro/mmlu_pro_eval.py --port 8089
46
+ ```
47
+
48
+ ```
49
+ Results:
50
+ Category: all
51
+ Accuracy: 0.554
52
+ Invalid responses: 0.000
53
+ Total latency: 112.065 s
54
+ Questions per second: 107.366
55
+ Total output tokens: 24076
56
+ Output tokens per second: 214.840
57
+ ```
58
+
59
  For more details on how this model was created and run in LLM Compressor, please contact Kyle Sayers on the vLLM Slack: https://communityinviter.com/apps/vllm-dev/join-vllm-developers-slack