CK0607 commited on
Commit
02f3a11
·
verified ·
1 Parent(s): 2b3ed97

Round 5 upload logs/round5.log

Browse files
Files changed (1) hide show
  1. logs/round5.log +149 -0
logs/round5.log ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2026-05-05 07:53:01] [ROUND5] Auditing requested new anchors
2
+ [2026-05-05 07:53:22] Dataset audit kept=8 dropped=[] domain_counts={'math': 3, 'code': 3, 'science': 2}
3
+ [2026-05-05 07:53:22] [ROUND5] Kept candidates: ['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal']; dropped=[]
4
+ [2026-05-05 07:53:22] Launching 16 LoRA trainings across 8 workers
5
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / aqua_rat_numeric
6
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / humaneval
7
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / aqua_rat_numeric
8
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / math_counting_easy
9
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / humaneval
10
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / math_counting_easy
11
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / conala_curated
12
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / conala_curated
13
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / mawps
14
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / mawps
15
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / mbpp_sanitized
16
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / medmcqa_easy
17
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / medmcqa_easy
18
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / pubmedqa_pqal
19
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / mbpp_sanitized
20
+ [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / pubmedqa_pqal
21
+ [2026-05-05 07:53:30] Available anchors in /workspace/round3_out/round4: 16 counts={'math': 6, 'code': 3, 'science': 7}
22
+ [2026-05-05 07:53:30] [ROUND5] Pool size=24 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
23
+ [2026-05-05 07:53:30] [ROUND5] Scaling sweep over usable Round4 held-outs
24
+ [2026-05-05 07:53:42] [ROUND5_CELL_DONE] N=4 seed=11 task=gsm_hard anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy']
25
+ [2026-05-05 07:53:46] [ROUND5_CELL_DONE] N=4 seed=11 task=gsm8k_test_500 anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy']
26
+ [2026-05-05 07:53:50] [ROUND5_CELL_DONE] N=4 seed=11 task=mbpp_test_held anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy']
27
+ [2026-05-05 07:53:54] [ROUND5_CELL_DONE] N=4 seed=11 task=mbpp_plus anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy']
28
+ [2026-05-05 07:53:59] [ROUND5_CELL_DONE] N=4 seed=11 task=openbookqa_test anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy']
29
+ [2026-05-05 07:54:03] [ROUND5_CELL_DONE] N=4 seed=22 task=gsm_hard anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy']
30
+ [2026-05-05 07:54:08] [ROUND5_CELL_DONE] N=4 seed=22 task=gsm8k_test_500 anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy']
31
+ [2026-05-05 07:54:12] [ROUND5_CELL_DONE] N=4 seed=22 task=mbpp_test_held anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy']
32
+ [2026-05-05 07:54:17] [ROUND5_CELL_DONE] N=4 seed=22 task=mbpp_plus anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy']
33
+ [2026-05-05 07:54:21] [ROUND5_CELL_DONE] N=4 seed=22 task=openbookqa_test anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy']
34
+ [2026-05-05 07:54:26] [ROUND5_CELL_DONE] N=4 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy']
35
+ [2026-05-05 07:54:30] [ROUND5_CELL_DONE] N=4 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy']
36
+ [2026-05-05 07:54:34] [ROUND5_CELL_DONE] N=4 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy']
37
+ [2026-05-05 07:54:38] [ROUND5_CELL_DONE] N=4 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy']
38
+ [2026-05-05 07:54:43] [ROUND5_CELL_DONE] N=4 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy']
39
+ [2026-05-05 07:54:47] [ROUND5_CELL_DONE] N=4 seed=44 task=gsm_hard anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat']
40
+ [2026-05-05 07:54:51] [ROUND5_CELL_DONE] N=4 seed=44 task=gsm8k_test_500 anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat']
41
+ [2026-05-05 07:54:55] [ROUND5_CELL_DONE] N=4 seed=44 task=mbpp_test_held anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat']
42
+ [2026-05-05 07:54:59] [ROUND5_CELL_DONE] N=4 seed=44 task=mbpp_plus anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat']
43
+ [2026-05-05 07:55:03] [ROUND5_CELL_DONE] N=4 seed=44 task=openbookqa_test anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat']
44
+ [2026-05-05 07:55:08] [ROUND5_CELL_DONE] N=4 seed=55 task=gsm_hard anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval']
45
+ [2026-05-05 07:55:12] [ROUND5_CELL_DONE] N=4 seed=55 task=gsm8k_test_500 anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval']
46
+ [2026-05-05 07:55:16] [ROUND5_CELL_DONE] N=4 seed=55 task=mbpp_test_held anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval']
47
+ [2026-05-05 07:55:21] [ROUND5_CELL_DONE] N=4 seed=55 task=mbpp_plus anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval']
48
+ [2026-05-05 07:55:25] [ROUND5_CELL_DONE] N=4 seed=55 task=openbookqa_test anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval']
49
+ [2026-05-05 07:55:32] [ROUND5_CELL_DONE] N=8 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
50
+ [2026-05-05 07:55:39] [ROUND5_CELL_DONE] N=8 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
51
+ [2026-05-05 07:55:46] [ROUND5_CELL_DONE] N=8 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
52
+ [2026-05-05 07:55:53] [ROUND5_CELL_DONE] N=8 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
53
+ [2026-05-05 07:56:00] [ROUND5_CELL_DONE] N=8 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
54
+ [2026-05-05 07:56:07] [ROUND5_CELL_DONE] N=8 seed=22 task=gsm_hard anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
55
+ [2026-05-05 07:56:14] [ROUND5_CELL_DONE] N=8 seed=22 task=gsm8k_test_500 anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
56
+ [2026-05-05 07:56:21] [ROUND5_CELL_DONE] N=8 seed=22 task=mbpp_test_held anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
57
+ [2026-05-05 07:56:28] [ROUND5_CELL_DONE] N=8 seed=22 task=mbpp_plus anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
58
+ [2026-05-05 07:56:35] [ROUND5_CELL_DONE] N=8 seed=22 task=openbookqa_test anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
59
+ [2026-05-05 07:56:41] [ROUND5_CELL_DONE] N=8 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal']
60
+ [2026-05-05 07:56:48] [ROUND5_CELL_DONE] N=8 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal']
61
+ [2026-05-05 07:56:54] [ROUND5_CELL_DONE] N=8 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal']
62
+ [2026-05-05 07:57:01] [ROUND5_CELL_DONE] N=8 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal']
63
+ [2026-05-05 07:57:08] [ROUND5_CELL_DONE] N=8 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal']
64
+ [2026-05-05 07:57:15] [ROUND5_CELL_DONE] N=8 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized']
65
+ [2026-05-05 07:57:22] [ROUND5_CELL_DONE] N=8 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized']
66
+ [2026-05-05 07:57:28] [ROUND5_CELL_DONE] N=8 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized']
67
+ [2026-05-05 07:57:35] [ROUND5_CELL_DONE] N=8 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized']
68
+ [2026-05-05 07:57:41] [ROUND5_CELL_DONE] N=8 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized']
69
+ [2026-05-05 07:57:48] [ROUND5_CELL_DONE] N=8 seed=55 task=gsm_hard anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval']
70
+ [2026-05-05 07:57:54] [ROUND5_CELL_DONE] N=8 seed=55 task=gsm8k_test_500 anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval']
71
+ [2026-05-05 07:58:00] [ROUND5_CELL_DONE] N=8 seed=55 task=mbpp_test_held anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval']
72
+ [2026-05-05 07:58:07] [ROUND5_CELL_DONE] N=8 seed=55 task=mbpp_plus anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval']
73
+ [2026-05-05 07:58:13] [ROUND5_CELL_DONE] N=8 seed=55 task=openbookqa_test anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval']
74
+ [2026-05-05 07:58:22] [ROUND5_CELL_DONE] N=12 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
75
+ [2026-05-05 07:58:31] [ROUND5_CELL_DONE] N=12 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
76
+ [2026-05-05 07:58:39] [ROUND5_CELL_DONE] N=12 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
77
+ [2026-05-05 07:58:47] [ROUND5_CELL_DONE] N=12 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
78
+ [2026-05-05 07:58:56] [ROUND5_CELL_DONE] N=12 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy']
79
+ [2026-05-05 07:59:04] [ROUND5_CELL_DONE] N=12 seed=22 task=gsm_hard anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
80
+ [2026-05-05 07:59:13] [ROUND5_CELL_DONE] N=12 seed=22 task=gsm8k_test_500 anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
81
+ [2026-05-05 07:59:21] [ROUND5_CELL_DONE] N=12 seed=22 task=mbpp_test_held anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
82
+ [2026-05-05 07:59:29] [ROUND5_CELL_DONE] N=12 seed=22 task=mbpp_plus anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
83
+ [2026-05-05 07:59:38] [ROUND5_CELL_DONE] N=12 seed=22 task=openbookqa_test anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
84
+ [2026-05-05 07:59:46] [ROUND5_CELL_DONE] N=12 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal']
85
+ [2026-05-05 07:59:54] [ROUND5_CELL_DONE] N=12 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal']
86
+ [2026-05-05 08:00:03] [ROUND5_CELL_DONE] N=12 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal']
87
+ [2026-05-05 08:00:12] [ROUND5_CELL_DONE] N=12 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal']
88
+ [2026-05-05 08:00:20] [ROUND5_CELL_DONE] N=12 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal']
89
+ [2026-05-05 08:00:29] [ROUND5_CELL_DONE] N=12 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy']
90
+ [2026-05-05 08:00:38] [ROUND5_CELL_DONE] N=12 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy']
91
+ [2026-05-05 08:00:47] [ROUND5_CELL_DONE] N=12 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy']
92
+ [2026-05-05 08:00:55] [ROUND5_CELL_DONE] N=12 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy']
93
+ [2026-05-05 08:01:03] [ROUND5_CELL_DONE] N=12 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy']
94
+ [2026-05-05 08:01:12] [ROUND5_CELL_DONE] N=12 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated']
95
+ [2026-05-05 08:01:20] [ROUND5_CELL_DONE] N=12 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated']
96
+ [2026-05-05 08:01:28] [ROUND5_CELL_DONE] N=12 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated']
97
+ [2026-05-05 08:01:37] [ROUND5_CELL_DONE] N=12 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated']
98
+ [2026-05-05 08:01:45] [ROUND5_CELL_DONE] N=12 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated']
99
+ [2026-05-05 08:01:54] [ROUND5_CELL_DONE] N=16 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy']
100
+ [2026-05-05 08:02:03] [ROUND5_CELL_DONE] N=16 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy']
101
+ [2026-05-05 08:02:13] [ROUND5_CELL_DONE] N=16 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy']
102
+ [2026-05-05 08:02:22] [ROUND5_CELL_DONE] N=16 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy']
103
+ [2026-05-05 08:02:31] [ROUND5_CELL_DONE] N=16 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy']
104
+ [2026-05-05 08:02:40] [ROUND5_CELL_DONE] N=16 seed=22 task=gsm_hard anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
105
+ [2026-05-05 08:02:49] [ROUND5_CELL_DONE] N=16 seed=22 task=gsm8k_test_500 anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
106
+ [2026-05-05 08:02:58] [ROUND5_CELL_DONE] N=16 seed=22 task=mbpp_test_held anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
107
+ [2026-05-05 08:03:07] [ROUND5_CELL_DONE] N=16 seed=22 task=mbpp_plus anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
108
+ [2026-05-05 08:03:16] [ROUND5_CELL_DONE] N=16 seed=22 task=openbookqa_test anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
109
+ [2026-05-05 08:03:25] [ROUND5_CELL_DONE] N=16 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
110
+ [2026-05-05 08:03:34] [ROUND5_CELL_DONE] N=16 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
111
+ [2026-05-05 08:03:43] [ROUND5_CELL_DONE] N=16 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
112
+ [2026-05-05 08:03:52] [ROUND5_CELL_DONE] N=16 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
113
+ [2026-05-05 08:04:01] [ROUND5_CELL_DONE] N=16 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
114
+ [2026-05-05 08:04:11] [ROUND5_CELL_DONE] N=16 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
115
+ [2026-05-05 08:04:20] [ROUND5_CELL_DONE] N=16 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
116
+ [2026-05-05 08:04:29] [ROUND5_CELL_DONE] N=16 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
117
+ [2026-05-05 08:04:38] [ROUND5_CELL_DONE] N=16 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
118
+ [2026-05-05 08:04:47] [ROUND5_CELL_DONE] N=16 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
119
+ [2026-05-05 08:04:56] [ROUND5_CELL_DONE] N=16 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
120
+ [2026-05-05 08:05:06] [ROUND5_CELL_DONE] N=16 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
121
+ [2026-05-05 08:05:15] [ROUND5_CELL_DONE] N=16 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
122
+ [2026-05-05 08:05:24] [ROUND5_CELL_DONE] N=16 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
123
+ [2026-05-05 08:05:33] [ROUND5_CELL_DONE] N=16 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy']
124
+ [2026-05-05 08:05:45] [ROUND5_CELL_DONE] N=24 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
125
+ [2026-05-05 08:05:57] [ROUND5_CELL_DONE] N=24 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
126
+ [2026-05-05 08:06:09] [ROUND5_CELL_DONE] N=24 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
127
+ [2026-05-05 08:06:21] [ROUND5_CELL_DONE] N=24 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
128
+ [2026-05-05 08:06:33] [ROUND5_CELL_DONE] N=24 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
129
+ [2026-05-05 08:06:45] [ROUND5_CELL_DONE] N=24 seed=22 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
130
+ [2026-05-05 08:06:57] [ROUND5_CELL_DONE] N=24 seed=22 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
131
+ [2026-05-05 08:07:09] [ROUND5_CELL_DONE] N=24 seed=22 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
132
+ [2026-05-05 08:07:21] [ROUND5_CELL_DONE] N=24 seed=22 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
133
+ [2026-05-05 08:07:33] [ROUND5_CELL_DONE] N=24 seed=22 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
134
+ [2026-05-05 08:07:44] [ROUND5_CELL_DONE] N=24 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
135
+ [2026-05-05 08:07:56] [ROUND5_CELL_DONE] N=24 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
136
+ [2026-05-05 08:08:07] [ROUND5_CELL_DONE] N=24 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
137
+ [2026-05-05 08:08:20] [ROUND5_CELL_DONE] N=24 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
138
+ [2026-05-05 08:08:31] [ROUND5_CELL_DONE] N=24 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
139
+ [2026-05-05 08:08:43] [ROUND5_CELL_DONE] N=24 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
140
+ [2026-05-05 08:08:55] [ROUND5_CELL_DONE] N=24 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
141
+ [2026-05-05 08:09:07] [ROUND5_CELL_DONE] N=24 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
142
+ [2026-05-05 08:09:18] [ROUND5_CELL_DONE] N=24 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
143
+ [2026-05-05 08:09:30] [ROUND5_CELL_DONE] N=24 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
144
+ [2026-05-05 08:09:41] [ROUND5_CELL_DONE] N=24 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
145
+ [2026-05-05 08:09:53] [ROUND5_CELL_DONE] N=24 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
146
+ [2026-05-05 08:10:04] [ROUND5_CELL_DONE] N=24 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
147
+ [2026-05-05 08:10:16] [ROUND5_CELL_DONE] N=24 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
148
+ [2026-05-05 08:10:28] [ROUND5_CELL_DONE] N=24 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal']
149
+ It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`hf upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.