[2026-05-05 07:53:01] [ROUND5] Auditing requested new anchors [2026-05-05 07:53:22] Dataset audit kept=8 dropped=[] domain_counts={'math': 3, 'code': 3, 'science': 2} [2026-05-05 07:53:22] [ROUND5] Kept candidates: ['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal']; dropped=[] [2026-05-05 07:53:22] Launching 16 LoRA trainings across 8 workers [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / aqua_rat_numeric [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / humaneval [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / aqua_rat_numeric [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / math_counting_easy [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / humaneval [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / math_counting_easy [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / conala_curated [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / conala_curated [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / mawps [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / mawps [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / mbpp_sanitized [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / medmcqa_easy [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / medmcqa_easy [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] Qwen/Qwen2.5-3B-Instruct / pubmedqa_pqal [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / mbpp_sanitized [2026-05-05 07:53:29] [SKIP_TRAIN_EXISTS] meta-llama/Llama-3.2-3B-Instruct / pubmedqa_pqal [2026-05-05 07:53:30] Available anchors in /workspace/round3_out/round4: 16 counts={'math': 6, 'code': 3, 'science': 7} [2026-05-05 07:53:30] [ROUND5] Pool size=24 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:53:30] [ROUND5] Scaling sweep over usable Round4 held-outs [2026-05-05 07:53:42] [ROUND5_CELL_DONE] N=4 seed=11 task=gsm_hard anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy'] [2026-05-05 07:53:46] [ROUND5_CELL_DONE] N=4 seed=11 task=gsm8k_test_500 anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy'] [2026-05-05 07:53:50] [ROUND5_CELL_DONE] N=4 seed=11 task=mbpp_test_held anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy'] [2026-05-05 07:53:54] [ROUND5_CELL_DONE] N=4 seed=11 task=mbpp_plus anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy'] [2026-05-05 07:53:59] [ROUND5_CELL_DONE] N=4 seed=11 task=openbookqa_test anchors=['r4:multiarith', 'r4:mbpp_sanitized', 'r5:aqua_rat_numeric', 'r5:medmcqa_easy'] [2026-05-05 07:54:03] [ROUND5_CELL_DONE] N=4 seed=22 task=gsm_hard anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 07:54:08] [ROUND5_CELL_DONE] N=4 seed=22 task=gsm8k_test_500 anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 07:54:12] [ROUND5_CELL_DONE] N=4 seed=22 task=mbpp_test_held anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 07:54:17] [ROUND5_CELL_DONE] N=4 seed=22 task=mbpp_plus anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 07:54:21] [ROUND5_CELL_DONE] N=4 seed=22 task=openbookqa_test anchors=['r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 07:54:26] [ROUND5_CELL_DONE] N=4 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy'] [2026-05-05 07:54:30] [ROUND5_CELL_DONE] N=4 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy'] [2026-05-05 07:54:34] [ROUND5_CELL_DONE] N=4 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy'] [2026-05-05 07:54:38] [ROUND5_CELL_DONE] N=4 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy'] [2026-05-05 07:54:43] [ROUND5_CELL_DONE] N=4 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy'] [2026-05-05 07:54:47] [ROUND5_CELL_DONE] N=4 seed=44 task=gsm_hard anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat'] [2026-05-05 07:54:51] [ROUND5_CELL_DONE] N=4 seed=44 task=gsm8k_test_500 anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat'] [2026-05-05 07:54:55] [ROUND5_CELL_DONE] N=4 seed=44 task=mbpp_test_held anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat'] [2026-05-05 07:54:59] [ROUND5_CELL_DONE] N=4 seed=44 task=mbpp_plus anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat'] [2026-05-05 07:55:03] [ROUND5_CELL_DONE] N=4 seed=44 task=openbookqa_test anchors=['r4:sciq', 'r4:humaneval', 'r4:math_algebra_easy', 'r4:aqua_rat'] [2026-05-05 07:55:08] [ROUND5_CELL_DONE] N=4 seed=55 task=gsm_hard anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval'] [2026-05-05 07:55:12] [ROUND5_CELL_DONE] N=4 seed=55 task=gsm8k_test_500 anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval'] [2026-05-05 07:55:16] [ROUND5_CELL_DONE] N=4 seed=55 task=mbpp_test_held anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval'] [2026-05-05 07:55:21] [ROUND5_CELL_DONE] N=4 seed=55 task=mbpp_plus anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval'] [2026-05-05 07:55:25] [ROUND5_CELL_DONE] N=4 seed=55 task=openbookqa_test anchors=['r4:sciq', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:humaneval'] [2026-05-05 07:55:32] [ROUND5_CELL_DONE] N=8 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:55:39] [ROUND5_CELL_DONE] N=8 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:55:46] [ROUND5_CELL_DONE] N=8 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:55:53] [ROUND5_CELL_DONE] N=8 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:56:00] [ROUND5_CELL_DONE] N=8 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:multiarith', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:56:07] [ROUND5_CELL_DONE] N=8 seed=22 task=gsm_hard anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:14] [ROUND5_CELL_DONE] N=8 seed=22 task=gsm8k_test_500 anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:21] [ROUND5_CELL_DONE] N=8 seed=22 task=mbpp_test_held anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:28] [ROUND5_CELL_DONE] N=8 seed=22 task=mbpp_plus anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:35] [ROUND5_CELL_DONE] N=8 seed=22 task=openbookqa_test anchors=['r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:41] [ROUND5_CELL_DONE] N=8 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:48] [ROUND5_CELL_DONE] N=8 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal'] [2026-05-05 07:56:54] [ROUND5_CELL_DONE] N=8 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal'] [2026-05-05 07:57:01] [ROUND5_CELL_DONE] N=8 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal'] [2026-05-05 07:57:08] [ROUND5_CELL_DONE] N=8 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:pubmedqa_pqal'] [2026-05-05 07:57:15] [ROUND5_CELL_DONE] N=8 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized'] [2026-05-05 07:57:22] [ROUND5_CELL_DONE] N=8 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized'] [2026-05-05 07:57:28] [ROUND5_CELL_DONE] N=8 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized'] [2026-05-05 07:57:35] [ROUND5_CELL_DONE] N=8 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized'] [2026-05-05 07:57:41] [ROUND5_CELL_DONE] N=8 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized'] [2026-05-05 07:57:48] [ROUND5_CELL_DONE] N=8 seed=55 task=gsm_hard anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval'] [2026-05-05 07:57:54] [ROUND5_CELL_DONE] N=8 seed=55 task=gsm8k_test_500 anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval'] [2026-05-05 07:58:00] [ROUND5_CELL_DONE] N=8 seed=55 task=mbpp_test_held anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval'] [2026-05-05 07:58:07] [ROUND5_CELL_DONE] N=8 seed=55 task=mbpp_plus anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval'] [2026-05-05 07:58:13] [ROUND5_CELL_DONE] N=8 seed=55 task=openbookqa_test anchors=['r4:sciq', 'r4:arc_easy', 'r4:humaneval', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval'] [2026-05-05 07:58:22] [ROUND5_CELL_DONE] N=12 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:58:31] [ROUND5_CELL_DONE] N=12 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:58:39] [ROUND5_CELL_DONE] N=12 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:58:47] [ROUND5_CELL_DONE] N=12 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:58:56] [ROUND5_CELL_DONE] N=12 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:medmcqa_easy'] [2026-05-05 07:59:04] [ROUND5_CELL_DONE] N=12 seed=22 task=gsm_hard anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:13] [ROUND5_CELL_DONE] N=12 seed=22 task=gsm8k_test_500 anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:21] [ROUND5_CELL_DONE] N=12 seed=22 task=mbpp_test_held anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:29] [ROUND5_CELL_DONE] N=12 seed=22 task=mbpp_plus anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:38] [ROUND5_CELL_DONE] N=12 seed=22 task=openbookqa_test anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:46] [ROUND5_CELL_DONE] N=12 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal'] [2026-05-05 07:59:54] [ROUND5_CELL_DONE] N=12 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal'] [2026-05-05 08:00:03] [ROUND5_CELL_DONE] N=12 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal'] [2026-05-05 08:00:12] [ROUND5_CELL_DONE] N=12 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal'] [2026-05-05 08:00:20] [ROUND5_CELL_DONE] N=12 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:mbpp_sanitized', 'r5:pubmedqa_pqal'] [2026-05-05 08:00:29] [ROUND5_CELL_DONE] N=12 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:00:38] [ROUND5_CELL_DONE] N=12 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:00:47] [ROUND5_CELL_DONE] N=12 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:00:55] [ROUND5_CELL_DONE] N=12 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:01:03] [ROUND5_CELL_DONE] N=12 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mbpp_sanitized', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:01:12] [ROUND5_CELL_DONE] N=12 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated'] [2026-05-05 08:01:20] [ROUND5_CELL_DONE] N=12 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated'] [2026-05-05 08:01:28] [ROUND5_CELL_DONE] N=12 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated'] [2026-05-05 08:01:37] [ROUND5_CELL_DONE] N=12 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated'] [2026-05-05 08:01:45] [ROUND5_CELL_DONE] N=12 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated'] [2026-05-05 08:01:54] [ROUND5_CELL_DONE] N=16 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 08:02:03] [ROUND5_CELL_DONE] N=16 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 08:02:13] [ROUND5_CELL_DONE] N=16 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 08:02:22] [ROUND5_CELL_DONE] N=16 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 08:02:31] [ROUND5_CELL_DONE] N=16 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:arc_easy', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy'] [2026-05-05 08:02:40] [ROUND5_CELL_DONE] N=16 seed=22 task=gsm_hard anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:02:49] [ROUND5_CELL_DONE] N=16 seed=22 task=gsm8k_test_500 anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:02:58] [ROUND5_CELL_DONE] N=16 seed=22 task=mbpp_test_held anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:07] [ROUND5_CELL_DONE] N=16 seed=22 task=mbpp_plus anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:16] [ROUND5_CELL_DONE] N=16 seed=22 task=openbookqa_test anchors=['r4:mbpp', 'r4:openbookqa', 'r4:svamp', 'r4:mmlu_high_school_biology', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:25] [ROUND5_CELL_DONE] N=16 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:34] [ROUND5_CELL_DONE] N=16 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:43] [ROUND5_CELL_DONE] N=16 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:03:52] [ROUND5_CELL_DONE] N=16 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:04:01] [ROUND5_CELL_DONE] N=16 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:svamp', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:04:11] [ROUND5_CELL_DONE] N=16 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:04:20] [ROUND5_CELL_DONE] N=16 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:04:29] [ROUND5_CELL_DONE] N=16 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:04:38] [ROUND5_CELL_DONE] N=16 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:04:47] [ROUND5_CELL_DONE] N=16 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mbpp_sanitized', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:04:56] [ROUND5_CELL_DONE] N=16 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:05:06] [ROUND5_CELL_DONE] N=16 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:05:15] [ROUND5_CELL_DONE] N=16 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:05:24] [ROUND5_CELL_DONE] N=16 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:05:33] [ROUND5_CELL_DONE] N=16 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:multiarith', 'r4:humaneval', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy'] [2026-05-05 08:05:45] [ROUND5_CELL_DONE] N=24 seed=11 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:05:57] [ROUND5_CELL_DONE] N=24 seed=11 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:06:09] [ROUND5_CELL_DONE] N=24 seed=11 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:06:21] [ROUND5_CELL_DONE] N=24 seed=11 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:06:33] [ROUND5_CELL_DONE] N=24 seed=11 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:06:45] [ROUND5_CELL_DONE] N=24 seed=22 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:06:57] [ROUND5_CELL_DONE] N=24 seed=22 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:07:09] [ROUND5_CELL_DONE] N=24 seed=22 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:07:21] [ROUND5_CELL_DONE] N=24 seed=22 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:07:33] [ROUND5_CELL_DONE] N=24 seed=22 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:07:44] [ROUND5_CELL_DONE] N=24 seed=33 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:07:56] [ROUND5_CELL_DONE] N=24 seed=33 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:08:07] [ROUND5_CELL_DONE] N=24 seed=33 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:08:20] [ROUND5_CELL_DONE] N=24 seed=33 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:08:31] [ROUND5_CELL_DONE] N=24 seed=33 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:08:43] [ROUND5_CELL_DONE] N=24 seed=44 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:08:55] [ROUND5_CELL_DONE] N=24 seed=44 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:09:07] [ROUND5_CELL_DONE] N=24 seed=44 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:09:18] [ROUND5_CELL_DONE] N=24 seed=44 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:09:30] [ROUND5_CELL_DONE] N=24 seed=44 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:09:41] [ROUND5_CELL_DONE] N=24 seed=55 task=gsm_hard anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:09:53] [ROUND5_CELL_DONE] N=24 seed=55 task=gsm8k_test_500 anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:10:04] [ROUND5_CELL_DONE] N=24 seed=55 task=mbpp_test_held anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:10:16] [ROUND5_CELL_DONE] N=24 seed=55 task=mbpp_plus anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] [2026-05-05 08:10:28] [ROUND5_CELL_DONE] N=24 seed=55 task=openbookqa_test anchors=['r4:gsm8k', 'r4:mbpp', 'r4:sciq', 'r4:arc_easy', 'r4:openbookqa', 'r4:svamp', 'r4:multiarith', 'r4:mmlu_high_school_biology', 'r4:math_counting_easy', 'r4:humaneval', 'r4:mmlu_high_school_physics', 'r4:mbpp_sanitized', 'r4:mmlu_elementary_math', 'r4:math_algebra_easy', 'r4:aqua_rat', 'r4:medmcqa_easy', 'r5:aqua_rat_numeric', 'r5:math_counting_easy', 'r5:mawps', 'r5:mbpp_sanitized', 'r5:humaneval', 'r5:conala_curated', 'r5:medmcqa_easy', 'r5:pubmedqa_pqal'] It seems you are trying to upload a large folder at once. This might take some time and then fail if the folder is too large. For such cases, it is recommended to upload in smaller batches or to use `HfApi().upload_large_folder(...)`/`hf upload-large-folder` instead. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#upload-a-large-folder.