MSGEncrypted commited on
Commit
e574f0b
·
verified ·
1 Parent(s): fc04389

Publish math-lora (gate passed: gsm8k)

Browse files
Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -18,9 +18,11 @@ Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via `rese
18
 
19
  ## Benchmark gate
20
 
21
- - eval profile: `math`
22
  - gate: **PASSED**
23
 
 
 
24
  | check | value | result |
25
  | --- | ---: | --- |
26
  | gsm8k >= 0.05 | 0.4000 | pass |
@@ -29,6 +31,20 @@ Trained, evaluated, and gated on [Modal](https://modal.com/docs/guide) via `rese
29
  | hellaswag regress <= 0.03 | 0.0000 | pass |
30
  | piqa regress <= 0.03 | 0.0200 | pass |
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## lm-eval results
33
 
34
  | task | metric | baseline | candidate | delta |
 
18
 
19
  ## Benchmark gate
20
 
21
+ - skill eval profile: `math`
22
  - gate: **PASSED**
23
 
24
+ ### Skill checks
25
+
26
  | check | value | result |
27
  | --- | ---: | --- |
28
  | gsm8k >= 0.05 | 0.4000 | pass |
 
31
  | hellaswag regress <= 0.03 | 0.0000 | pass |
32
  | piqa regress <= 0.03 | 0.0200 | pass |
33
 
34
+ - general eval profile: `compare_study`
35
+
36
+ ### General checks
37
+
38
+ | check | value | result |
39
+ | --- | ---: | --- |
40
+ | arc_easy regress <= 0.03 | -0.0300 | pass |
41
+ | arc_challenge regress <= 0.03 | -0.0400 | pass |
42
+ | hellaswag regress <= 0.03 | 0.0100 | pass |
43
+ | piqa regress <= 0.03 | 0.0100 | pass |
44
+ | boolq regress <= 0.03 | -0.0300 | pass |
45
+ | gsm8k regress <= 0.03 | -0.0700 | pass |
46
+
47
+
48
  ## lm-eval results
49
 
50
  | task | metric | baseline | candidate | delta |