MSGEncrypted
/

minicpm5-1b-math-lora

@@ -18,9 +18,11 @@ Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via `rese
 ## Benchmark gate
-- eval profile: `math`
 - gate: **PASSED**
 | check | value | result |
 | --- | ---: | --- |
 | gsm8k >= 0.05 | 0.4000 | pass |
@@ -29,6 +31,20 @@ Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via `rese
 | hellaswag regress <= 0.03 | 0.0000 | pass |
 | piqa regress <= 0.03 | 0.0200 | pass |
 ## lm-eval results
 | task | metric | baseline | candidate | delta |

 ## Benchmark gate
+- skill eval profile: `math`
 - gate: **PASSED**
+### Skill checks
 | check | value | result |
 | --- | ---: | --- |
 | gsm8k >= 0.05 | 0.4000 | pass |
 | hellaswag regress <= 0.03 | 0.0000 | pass |
 | piqa regress <= 0.03 | 0.0200 | pass |
+- general eval profile: `compare_study`
+### General checks
+| check | value | result |
+| --- | ---: | --- |
+| arc_easy regress <= 0.03 | -0.0300 | pass |
+| arc_challenge regress <= 0.03 | -0.0400 | pass |
+| hellaswag regress <= 0.03 | 0.0100 | pass |
+| piqa regress <= 0.03 | 0.0100 | pass |
+| boolq regress <= 0.03 | -0.0300 | pass |
+| gsm8k regress <= 0.03 | -0.0700 | pass |
 ## lm-eval results
 | task | metric | baseline | candidate | delta |