The repository for codeparrot/apps contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/codeparrot/apps.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] [2026-05-04 20:40:15] [DATASET_FAIL] codeparrot/apps cfg=all: ValueError: The repository for codeparrot/apps contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/codeparrot/apps.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
The repository for codeparrot/apps contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/codeparrot/apps.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] [2026-05-04 20:40:15] [DATASET_FAIL] codeparrot/apps cfg=all: ValueError: The repository for codeparrot/apps contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/codeparrot/apps.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[2026-05-04 20:40:31] [DATASET_FAIL] deepmind/code_contests cfg=None: ValueError: Loading this dataset requires you to execute custom code contained in the dataset repository on your local machine. Please set the option `trust_remote_code=True` to permit loading of this dataset.
[2026-05-04 20:42:47] Dataset audit kept=28 dropped=['apps_introductory', 'codecontests_easy'] domain_counts={'math': 8, 'code': 6, 'science': 8}
[2026-05-04 20:42:47] Launching 56 LoRA trainings across 8 workers
[2026-05-04 20:42:56] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / gsm8k -> /workspace/round3_out/round3/X/gsm8k
[2026-05-04 20:42:56] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / gsm8k -> /workspace/round3_out/round3/Y/gsm8k
[2026-05-04 20:42:56] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / svamp -> /workspace/round3_out/round3/X/svamp
[2026-05-04 20:42:56] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / svamp -> /workspace/round3_out/round3/Y/svamp
[2026-05-04 20:42:56] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / asdiv -> /workspace/round3_out/round3/X/asdiv
[2026-05-04 20:42:56] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / asdiv -> /workspace/round3_out/round3/Y/asdiv
[2026-05-04 20:42:56] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mawps -> /workspace/round3_out/round3/X/mawps
[2026-05-04 20:42:56] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mawps -> /workspace/round3_out/round3/Y/mawps
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.3779, 'grad_norm': 2.049877405166626, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7977, 'grad_norm': 1.8697302341461182, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.0465, 'grad_norm': 0.9380105137825012, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8553, 'grad_norm': 0.9028997421264648, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.8056, 'grad_norm': 1.2171504497528076, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 9.0053, 'train_samples_per_second': 88.836, 'train_steps_per_second': 11.105, 'train_loss': 1.3920844507217407, 'epoch': 1.0}
[2026-05-04 20:44:18] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mawps
[2026-05-04 20:44:18] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / math_counting_easy -> /workspace/round3_out/round3/Y/math_counting_easy
{'loss': 3.8352, 'grad_norm': 1.7309024333953857, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4826, 'grad_norm': 1.8195528984069824, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.0809, 'grad_norm': 0.47451654076576233, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0325, 'grad_norm': 0.4937838017940521, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.0522, 'grad_norm': 0.6705223917961121, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 9.0138, 'train_samples_per_second': 88.753, 'train_steps_per_second': 11.094, 'train_loss': 1.4255642485618591, 'epoch': 1.0}
[2026-05-04 20:44:18] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / asdiv
[2026-05-04 20:44:18] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / math_algebra_easy -> /workspace/round3_out/round3/Y/math_algebra_easy
{'loss': 3.8734, 'grad_norm': 1.8566179275512695, 'learning_rate': 4e-05, 'epoch': 0.011363636363636364}
{'loss': 2.4913, 'grad_norm': 1.728808045387268, 'learning_rate': 0.00017268924518431438, 'epoch': 0.2840909090909091}
{'loss': 1.1234, 'grad_norm': 0.6941857933998108, 'learning_rate': 8.679103583582979e-05, 'epoch': 0.5681818181818182}
{'loss': 1.0455, 'grad_norm': 0.8766822218894958, 'learning_rate': 1.1863693398535114e-05, 'epoch': 0.8522727272727273}
{'train_runtime': 7.4337, 'train_samples_per_second': 94.165, 'train_steps_per_second': 11.838, 'train_loss': 1.4885666478763928, 'epoch': 1.0}
[2026-05-04 20:44:18] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / svamp
[2026-05-04 20:44:18] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / aqua_rat -> /workspace/round3_out/round3/Y/aqua_rat
{'loss': 3.6954, 'grad_norm': 1.6379107236862183, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.3591, 'grad_norm': 1.3714717626571655, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.2221, 'grad_norm': 0.5543839335441589, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1481, 'grad_norm': 0.4706638753414154, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.2071, 'grad_norm': 0.4777754247188568, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 9.9447, 'train_samples_per_second': 80.445, 'train_steps_per_second': 10.056, 'train_loss': 1.4974398279190064, 'epoch': 1.0}
[2026-05-04 20:44:19] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / gsm8k
[2026-05-04 20:44:19] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / multiarith -> /workspace/round3_out/round3/Y/multiarith
{'loss': 3.426, 'grad_norm': 3.807481527328491, 'learning_rate': 4e-05, 'epoch': 0.011363636363636364}
{'loss': 1.965, 'grad_norm': 1.614963173866272, 'learning_rate': 0.00017268924518431438, 'epoch': 0.2840909090909091}
{'loss': 0.7795, 'grad_norm': 0.9405693411827087, 'learning_rate': 8.679103583582979e-05, 'epoch': 0.5681818181818182}
{'loss': 0.5861, 'grad_norm': 1.0020023584365845, 'learning_rate': 1.1863693398535114e-05, 'epoch': 0.8522727272727273}
{'train_runtime': 9.67, 'train_samples_per_second': 72.389, 'train_steps_per_second': 9.1, 'train_loss': 1.0466151427138934, 'epoch': 1.0}
[2026-05-04 20:44:20] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / svamp
[2026-05-04 20:44:20] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / aqua_rat -> /workspace/round3_out/round3/X/aqua_rat
{'loss': 3.4613, 'grad_norm': 3.515136480331421, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.0761, 'grad_norm': 1.8351175785064697, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.8332, 'grad_norm': 0.6909456253051758, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.6822, 'grad_norm': 0.635485827922821, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.7196, 'grad_norm': 0.6340501308441162, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.1162, 'train_samples_per_second': 71.967, 'train_steps_per_second': 8.996, 'train_loss': 1.091609320640564, 'epoch': 1.0}
[2026-05-04 20:44:22] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / asdiv
[2026-05-04 20:44:22] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / math_algebra_easy -> /workspace/round3_out/round3/X/math_algebra_easy
{'loss': 4.3819, 'grad_norm': 4.1420087814331055, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.6276, 'grad_norm': 1.437969446182251, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.1135, 'grad_norm': 0.6151057481765747, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8251, 'grad_norm': 0.7535384297370911, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.7516, 'grad_norm': 1.1021652221679688, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.4232, 'train_samples_per_second': 70.033, 'train_steps_per_second': 8.754, 'train_loss': 1.3469935846328736, 'epoch': 1.0}
[2026-05-04 20:44:22] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mawps
[2026-05-04 20:44:22] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / math_counting_easy -> /workspace/round3_out/round3/X/math_counting_easy
{'loss': 2.649, 'grad_norm': 3.195756196975708, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.331, 'grad_norm': 1.597908616065979, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.2739, 'grad_norm': 0.7369517087936401, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.1692, 'grad_norm': 0.5164644122123718, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.1625, 'grad_norm': 0.6150388717651367, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 11.6208, 'train_samples_per_second': 68.842, 'train_steps_per_second': 8.605, 'train_loss': 0.49733108043670654, 'epoch': 1.0}
[2026-05-04 20:44:22] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / gsm8k
[2026-05-04 20:44:22] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / multiarith -> /workspace/round3_out/round3/X/multiarith
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 2.94, 'grad_norm': 1.3758476972579956, 'learning_rate': 0.0001, 'epoch': 0.045454545454545456}
{'train_runtime': 2.9453, 'train_samples_per_second': 57.379, 'train_steps_per_second': 7.469, 'train_loss': 2.453034845265475, 'epoch': 1.0}
[2026-05-04 20:44:39] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / math_counting_easy
[2026-05-04 20:44:39] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mbpp_sanitized -> /workspace/round3_out/round3/Y/mbpp_sanitized
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.844, 'grad_norm': 1.6947200298309326, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.018867924528301886}
{'loss': 2.4453, 'grad_norm': 1.7593973875045776, 'learning_rate': 0.00011873813145857249, 'epoch': 0.4716981132075472}
{'loss': 1.11, 'grad_norm': 0.6478723883628845, 'learning_rate': 1.771274927131139e-06, 'epoch': 0.9433962264150944}
{'train_runtime': 5.3419, 'train_samples_per_second': 78.623, 'train_steps_per_second': 9.922, 'train_loss': 1.7630185181239866, 'epoch': 1.0}
[2026-05-04 20:44:42] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / multiarith
[2026-05-04 20:44:42] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mbpp -> /workspace/round3_out/round3/Y/mbpp
{'loss': 3.6026, 'grad_norm': 1.7872775793075562, 'learning_rate': 5e-05, 'epoch': 0.015384615384615385}
{'loss': 2.2121, 'grad_norm': 1.6280862092971802, 'learning_rate': 0.000146997674302732, 'epoch': 0.38461538461538464}
{'loss': 0.8391, 'grad_norm': 0.46003976464271545, 'learning_rate': 2.8384781168560693e-05, 'epoch': 0.7692307692307693}
{'train_runtime': 7.4106, 'train_samples_per_second': 69.9, 'train_steps_per_second': 8.771, 'train_loss': 1.395078703073355, 'epoch': 1.0}
[2026-05-04 20:44:43] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / math_algebra_easy
[2026-05-04 20:44:43] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / codealpaca_mini -> /workspace/round3_out/round3/Y/codealpaca_mini
{'loss': 2.4397, 'grad_norm': 2.9430527687072754, 'learning_rate': 0.0001, 'epoch': 0.045454545454545456}
{'train_runtime': 3.2574, 'train_samples_per_second': 51.881, 'train_steps_per_second': 6.754, 'train_loss': 1.7074141827496616, 'epoch': 1.0}
[2026-05-04 20:44:43] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / math_counting_easy
[2026-05-04 20:44:43] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mbpp_sanitized -> /workspace/round3_out/round3/X/mbpp_sanitized
{'loss': 3.4398, 'grad_norm': 1.4278801679611206, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1996, 'grad_norm': 1.3018642663955688, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.9979, 'grad_norm': 0.4383689761161804, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.968, 'grad_norm': 0.5814603567123413, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.9748, 'grad_norm': 0.5396610498428345, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.5067, 'train_samples_per_second': 76.142, 'train_steps_per_second': 9.518, 'train_loss': 1.2974739861488342, 'epoch': 1.0}
[2026-05-04 20:44:47] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / aqua_rat
[2026-05-04 20:44:47] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / humaneval -> /workspace/round3_out/round3/Y/humaneval
{'loss': 3.1334, 'grad_norm': 3.7447922229766846, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.018867924528301886}
{'loss': 1.6419, 'grad_norm': 1.3764970302581787, 'learning_rate': 0.00011873813145857249, 'epoch': 0.4716981132075472}
{'loss': 0.5897, 'grad_norm': 1.7237893342971802, 'learning_rate': 1.771274927131139e-06, 'epoch': 0.9433962264150944}
{'train_runtime': 6.9214, 'train_samples_per_second': 60.681, 'train_steps_per_second': 7.657, 'train_loss': 1.1092271534901745, 'epoch': 1.0}
[2026-05-04 20:44:47] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / multiarith
[2026-05-04 20:44:47] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mbpp -> /workspace/round3_out/round3/X/mbpp
{'loss': 3.1268, 'grad_norm': 3.655898094177246, 'learning_rate': 5e-05, 'epoch': 0.015384615384615385}
{'loss': 1.5301, 'grad_norm': 1.5200306177139282, 'learning_rate': 0.000146997674302732, 'epoch': 0.38461538461538464}
{'loss': 0.4029, 'grad_norm': 0.9035202860832214, 'learning_rate': 2.8384781168560693e-05, 'epoch': 0.7692307692307693}
{'train_runtime': 9.3111, 'train_samples_per_second': 55.632, 'train_steps_per_second': 6.981, 'train_loss': 0.827459405018733, 'epoch': 1.0}
[2026-05-04 20:44:49] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / math_algebra_easy
[2026-05-04 20:44:49] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / codealpaca_mini -> /workspace/round3_out/round3/X/codealpaca_mini
{'loss': 3.3511, 'grad_norm': 3.5977487564086914, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.7232, 'grad_norm': 0.9934104084968567, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.6148, 'grad_norm': 0.4904949367046356, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.4761, 'grad_norm': 0.48462048172950745, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.4958, 'grad_norm': 0.5735081434249878, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.0224, 'train_samples_per_second': 61.433, 'train_steps_per_second': 7.679, 'train_loss': 0.8437291312217713, 'epoch': 1.0}
[2026-05-04 20:44:51] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / aqua_rat
[2026-05-04 20:44:51] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / humaneval -> /workspace/round3_out/round3/X/humaneval
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 1.9847, 'grad_norm': 0.7501554489135742, 'learning_rate': 0.0002, 'epoch': 0.06666666666666667}
{'train_runtime': 3.0475, 'train_samples_per_second': 39.376, 'train_steps_per_second': 4.922, 'train_loss': 2.0691701332728067, 'epoch': 1.0}
[2026-05-04 20:44:58] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mbpp_sanitized
[2026-05-04 20:44:58] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / arc_easy -> /workspace/round3_out/round3/Y/arc_easy
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 1.9847, 'grad_norm': 0.7501554489135742, 'learning_rate': 0.0002, 'epoch': 0.06666666666666667}
{'train_runtime': 3.0571, 'train_samples_per_second': 39.254, 'train_steps_per_second': 4.907, 'train_loss': 2.0691701332728067, 'epoch': 1.0}
[2026-05-04 20:45:02] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mbpp
[2026-05-04 20:45:02] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / conala_curated -> /workspace/round3_out/round3/Y/conala_curated
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 1.7788, 'grad_norm': 1.2002047300338745, 'learning_rate': 0.0002, 'epoch': 0.06666666666666667}
{'train_runtime': 3.0606, 'train_samples_per_second': 39.208, 'train_steps_per_second': 4.901, 'train_loss': 1.7401427745819091, 'epoch': 1.0}
[2026-05-04 20:45:03] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mbpp_sanitized
[2026-05-04 20:45:03] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / arc_easy -> /workspace/round3_out/round3/X/arc_easy
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 1.7788, 'grad_norm': 1.2002047300338745, 'learning_rate': 0.0002, 'epoch': 0.06666666666666667}
{'train_runtime': 3.0014, 'train_samples_per_second': 39.982, 'train_steps_per_second': 4.998, 'train_loss': 1.7401427745819091, 'epoch': 1.0}
[2026-05-04 20:45:06] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mbpp
[2026-05-04 20:45:06] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / conala_curated -> /workspace/round3_out/round3/X/conala_curated
{'loss': 1.9355, 'grad_norm': 0.7733321785926819, 'learning_rate': 0.0001, 'epoch': 0.047619047619047616}
{'train_runtime': 4.1614, 'train_samples_per_second': 39.41, 'train_steps_per_second': 5.046, 'train_loss': 1.8369369677134924, 'epoch': 1.0}
[2026-05-04 20:45:08] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / humaneval
[2026-05-04 20:45:08] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / livecodebench_easy -> /workspace/round3_out/round3/Y/livecodebench_easy
{'loss': 1.6076, 'grad_norm': 1.2655946016311646, 'learning_rate': 0.0001, 'epoch': 0.047619047619047616}
{'train_runtime': 4.0712, 'train_samples_per_second': 40.283, 'train_steps_per_second': 5.158, 'train_loss': 1.4079822926294236, 'epoch': 1.0}
[2026-05-04 20:45:11] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / humaneval
[2026-05-04 20:45:11] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / livecodebench_easy -> /workspace/round3_out/round3/X/livecodebench_easy
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.1696, 'grad_norm': 1.3494852781295776, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.9552, 'grad_norm': 1.5449222326278687, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.7763, 'grad_norm': 0.42445844411849976, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.7042, 'grad_norm': 0.5568932890892029, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.6963, 'grad_norm': 0.5444496870040894, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.9095, 'train_samples_per_second': 57.515, 'train_steps_per_second': 7.189, 'train_loss': 1.0451295495033264, 'epoch': 1.0}
[2026-05-04 20:45:14] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / codealpaca_mini
[2026-05-04 20:45:14] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / sciq -> /workspace/round3_out/round3/Y/sciq
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.0203, 'grad_norm': 2.763631820678711, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.6338, 'grad_norm': 0.9826322793960571, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.6742, 'grad_norm': 0.45481133460998535, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.5435, 'grad_norm': 0.4968443512916565, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.5313, 'grad_norm': 0.5033280849456787, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 15.1456, 'train_samples_per_second': 52.821, 'train_steps_per_second': 6.603, 'train_loss': 0.8595820713043213, 'epoch': 1.0}
[2026-05-04 20:45:20] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / codealpaca_mini
[2026-05-04 20:45:20] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / sciq -> /workspace/round3_out/round3/X/sciq
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.8484, 'grad_norm': 1.6398802995681763, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5496, 'grad_norm': 1.2934966087341309, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.1213, 'grad_norm': 0.6275230646133423, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0787, 'grad_norm': 0.5565957427024841, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.0154, 'grad_norm': 0.7117006778717041, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.472, 'train_samples_per_second': 76.394, 'train_steps_per_second': 9.549, 'train_loss': 1.4542333102226257, 'epoch': 1.0}
[2026-05-04 20:45:24] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / arc_easy
[2026-05-04 20:45:24] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mmlu_high_school_biology -> /workspace/round3_out/round3/Y/mmlu_high_school_biology
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.0334, 'grad_norm': 1.80250084400177, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 3.0129, 'grad_norm': 1.8618228435516357, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.2127, 'grad_norm': 0.7751524448394775, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0915, 'grad_norm': 0.734074056148529, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.1144, 'grad_norm': 0.8494988083839417, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.6985, 'train_samples_per_second': 74.777, 'train_steps_per_second': 9.347, 'train_loss': 1.618064522743225, 'epoch': 1.0}
[2026-05-04 20:45:28] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / conala_curated
[2026-05-04 20:45:28] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / openbookqa -> /workspace/round3_out/round3/Y/openbookqa
{'loss': 3.8847, 'grad_norm': 3.8068673610687256, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2447, 'grad_norm': 1.1755918264389038, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.9851, 'grad_norm': 0.35951128602027893, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8872, 'grad_norm': 0.37428414821624756, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.8302, 'grad_norm': 0.49721041321754456, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.279, 'train_samples_per_second': 60.246, 'train_steps_per_second': 7.531, 'train_loss': 1.2532208013534545, 'epoch': 1.0}
[2026-05-04 20:45:31] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / arc_easy
[2026-05-04 20:45:31] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mmlu_high_school_biology -> /workspace/round3_out/round3/X/mmlu_high_school_biology
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.0268, 'grad_norm': 1.3463287353515625, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.9402, 'grad_norm': 1.4100244045257568, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.7752, 'grad_norm': 0.5019293427467346, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.7028, 'grad_norm': 0.6079208254814148, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.6955, 'grad_norm': 0.46237239241600037, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.4258, 'train_samples_per_second': 64.382, 'train_steps_per_second': 8.048, 'train_loss': 1.0392812609672546, 'epoch': 1.0}
[2026-05-04 20:45:35] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / livecodebench_easy
[2026-05-04 20:45:35] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / medmcqa_easy -> /workspace/round3_out/round3/Y/medmcqa_easy
{'loss': 3.8778, 'grad_norm': 3.8687405586242676, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 3.0304, 'grad_norm': 1.769575595855713, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.1852, 'grad_norm': 0.7959341406822205, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.916, 'grad_norm': 0.6567080616950989, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.9269, 'grad_norm': 0.6883994936943054, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.4591, 'train_samples_per_second': 59.439, 'train_steps_per_second': 7.43, 'train_loss': 1.5230862832069396, 'epoch': 1.0}
[2026-05-04 20:45:36] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / conala_curated
[2026-05-04 20:45:36] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / openbookqa -> /workspace/round3_out/round3/X/openbookqa
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.2159, 'grad_norm': 1.9379470348358154, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.7024, 'grad_norm': 1.6266368627548218, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.0166, 'grad_norm': 0.6357600092887878, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9528, 'grad_norm': 0.5494163036346436, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.9749, 'grad_norm': 0.5835414528846741, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.0026, 'train_samples_per_second': 79.98, 'train_steps_per_second': 9.997, 'train_loss': 1.4267976379394531, 'epoch': 1.0}
[2026-05-04 20:45:38] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / sciq
[2026-05-04 20:45:38] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mmlu_elementary_math -> /workspace/round3_out/round3/Y/mmlu_elementary_math
{'loss': 2.8013, 'grad_norm': 2.7357492446899414, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.6436, 'grad_norm': 0.8919246196746826, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.6866, 'grad_norm': 0.6452653408050537, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.5328, 'grad_norm': 0.7082944512367249, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.5261, 'grad_norm': 0.45447829365730286, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.8528, 'train_samples_per_second': 57.75, 'train_steps_per_second': 7.219, 'train_loss': 0.8588425946235657, 'epoch': 1.0}
[2026-05-04 20:45:39] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / livecodebench_easy
[2026-05-04 20:45:39] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / medmcqa_easy -> /workspace/round3_out/round3/X/medmcqa_easy
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.4057, 'grad_norm': 1.1825029850006104, 'learning_rate': 0.0001, 'epoch': 0.02564102564102564}
{'loss': 2.3912, 'grad_norm': 1.1029789447784424, 'learning_rate': 6.271435222196916e-05, 'epoch': 0.6410256410256411}
{'train_runtime': 4.4176, 'train_samples_per_second': 70.174, 'train_steps_per_second': 8.828, 'train_loss': 2.092302285707914, 'epoch': 1.0}
[2026-05-04 20:45:42] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mmlu_high_school_biology
[2026-05-04 20:45:42] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / math_algebra_medium -> /workspace/round3_out/round3/Y/math_algebra_medium
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.4852, 'grad_norm': 4.620140552520752, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4565, 'grad_norm': 1.5218631029129028, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 0.9192, 'grad_norm': 0.482086718082428, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.7591, 'grad_norm': 0.38656720519065857, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.7959, 'grad_norm': 0.5061098337173462, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.3467, 'train_samples_per_second': 59.94, 'train_steps_per_second': 7.492, 'train_loss': 1.252965097427368, 'epoch': 1.0}
[2026-05-04 20:45:47] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / sciq
[2026-05-04 20:45:47] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mmlu_elementary_math -> /workspace/round3_out/round3/X/mmlu_elementary_math
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.3715, 'grad_norm': 2.6516366004943848, 'learning_rate': 0.0001, 'epoch': 0.02564102564102564}
{'loss': 2.1153, 'grad_norm': 0.9119428396224976, 'learning_rate': 6.271435222196916e-05, 'epoch': 0.6410256410256411}
{'train_runtime': 5.5161, 'train_samples_per_second': 56.199, 'train_steps_per_second': 7.07, 'train_loss': 1.898055449510232, 'epoch': 1.0}
[2026-05-04 20:45:50] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mmlu_high_school_biology
[2026-05-04 20:45:50] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / math_algebra_medium -> /workspace/round3_out/round3/X/math_algebra_medium
{'loss': 4.4204, 'grad_norm': 1.818544626235962, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.9636, 'grad_norm': 1.5192375183105469, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.3084, 'grad_norm': 0.5148280262947083, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.2156, 'grad_norm': 0.5908874273300171, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.2111, 'grad_norm': 0.571685254573822, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.3268, 'train_samples_per_second': 77.469, 'train_steps_per_second': 9.684, 'train_loss': 1.6892347764968871, 'epoch': 1.0}
[2026-05-04 20:45:52] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / openbookqa
[2026-05-04 20:45:52] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mmlu_high_school_physics -> /workspace/round3_out/round3/Y/mmlu_high_school_physics
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.3515, 'grad_norm': 1.393470287322998, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.020833333333333332}
{'loss': 2.1993, 'grad_norm': 1.3444740772247314, 'learning_rate': 0.00010348994967025012, 'epoch': 0.5208333333333334}
{'train_runtime': 5.1387, 'train_samples_per_second': 73.559, 'train_steps_per_second': 9.341, 'train_loss': 1.6621684432029724, 'epoch': 1.0}
[2026-05-04 20:46:01] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mmlu_elementary_math
[2026-05-04 20:46:01] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / gsm_hard -> /workspace/round3_out/round3/Y/gsm_hard
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 4.6846, 'grad_norm': 4.238743305206299, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.8008, 'grad_norm': 1.5389924049377441, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.223, 'grad_norm': 0.592785120010376, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0619, 'grad_norm': 0.6057922840118408, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.0473, 'grad_norm': 0.5479642748832703, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.2537, 'train_samples_per_second': 60.361, 'train_steps_per_second': 7.545, 'train_loss': 1.552119369506836, 'epoch': 1.0}
[2026-05-04 20:46:04] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / openbookqa
[2026-05-04 20:46:04] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mmlu_high_school_physics -> /workspace/round3_out/round3/X/mmlu_high_school_physics
{'loss': 3.7883, 'grad_norm': 1.696028232574463, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.5512, 'grad_norm': 1.5602266788482666, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.0373, 'grad_norm': 0.6100544333457947, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0008, 'grad_norm': 0.540784478187561, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.011, 'grad_norm': 0.6599298119544983, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.2205, 'train_samples_per_second': 78.274, 'train_steps_per_second': 9.784, 'train_loss': 1.4124403762817384, 'epoch': 1.0}
[2026-05-04 20:46:05] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / medmcqa_easy
[2026-05-04 20:46:05] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / pubmedqa_pqal -> /workspace/round3_out/round3/Y/pubmedqa_pqal
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 2.9821, 'grad_norm': 1.171698808670044, 'learning_rate': 0.0002, 'epoch': 0.05263157894736842}
{'train_runtime': 2.4218, 'train_samples_per_second': 62.35, 'train_steps_per_second': 7.845, 'train_loss': 2.3227735193152177, 'epoch': 1.0}
[2026-05-04 20:46:08] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mmlu_high_school_physics
[2026-05-04 20:46:08] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / humaneval_plus -> /workspace/round3_out/round3/Y/humaneval_plus
{'loss': 3.1326, 'grad_norm': 3.205897092819214, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.020833333333333332}
{'loss': 1.8333, 'grad_norm': 1.1280487775802612, 'learning_rate': 0.00010348994967025012, 'epoch': 0.5208333333333334}
{'train_runtime': 6.4482, 'train_samples_per_second': 58.621, 'train_steps_per_second': 7.444, 'train_loss': 1.4304720560709636, 'epoch': 1.0}
[2026-05-04 20:46:08] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mmlu_elementary_math
[2026-05-04 20:46:08] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / gsm_hard -> /workspace/round3_out/round3/X/gsm_hard
{'loss': 4.045, 'grad_norm': 4.027966022491455, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4115, 'grad_norm': 1.3882919549942017, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.0318, 'grad_norm': 0.5832227468490601, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.9003, 'grad_norm': 0.4231211841106415, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.9012, 'grad_norm': 0.44992396235466003, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 13.0637, 'train_samples_per_second': 61.239, 'train_steps_per_second': 7.655, 'train_loss': 1.3275136470794677, 'epoch': 1.0}
[2026-05-04 20:46:09] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / medmcqa_easy
[2026-05-04 20:46:09] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / pubmedqa_pqal -> /workspace/round3_out/round3/X/pubmedqa_pqal
{'loss': 3.4737, 'grad_norm': 1.69599449634552, 'learning_rate': 4e-05, 'epoch': 0.010101010101010102}
{'loss': 2.2159, 'grad_norm': 1.5975748300552368, 'learning_rate': 0.0001784799385278661, 'epoch': 0.25252525252525254}
{'loss': 0.9123, 'grad_norm': 0.4921441674232483, 'learning_rate': 0.00010667926337451217, 'epoch': 0.5050505050505051}
{'loss': 0.8223, 'grad_norm': 0.4936054050922394, 'learning_rate': 3.0480757232535772e-05, 'epoch': 0.7575757575757576}
{'train_runtime': 11.4032, 'train_samples_per_second': 69.279, 'train_steps_per_second': 8.682, 'train_loss': 1.2137589141575977, 'epoch': 1.0}
[2026-05-04 20:46:09] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / math_algebra_medium
[2026-05-04 20:46:09] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mmlu_college_chemistry -> /workspace/round3_out/round3/Y/mmlu_college_chemistry
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.1078, 'grad_norm': 3.6280457973480225, 'learning_rate': 4e-05, 'epoch': 0.010101010101010102}
{'loss': 1.4901, 'grad_norm': 1.5740772485733032, 'learning_rate': 0.0001784799385278661, 'epoch': 0.25252525252525254}
{'loss': 0.3388, 'grad_norm': 0.7208344340324402, 'learning_rate': 0.00010667926337451217, 'epoch': 0.5050505050505051}
{'loss': 0.216, 'grad_norm': 0.7119245529174805, 'learning_rate': 3.0480757232535772e-05, 'epoch': 0.7575757575757576}
{'train_runtime': 14.0475, 'train_samples_per_second': 56.238, 'train_steps_per_second': 7.048, 'train_loss': 0.582133741089792, 'epoch': 1.0}
[2026-05-04 20:46:18] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / math_algebra_medium
[2026-05-04 20:46:18] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mmlu_college_chemistry -> /workspace/round3_out/round3/X/mmlu_college_chemistry
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 2.8817, 'grad_norm': 2.6565637588500977, 'learning_rate': 0.0002, 'epoch': 0.05263157894736842}
{'train_runtime': 2.7949, 'train_samples_per_second': 54.028, 'train_steps_per_second': 6.798, 'train_loss': 1.9655195035432513, 'epoch': 1.0}
[2026-05-04 20:46:22] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mmlu_high_school_physics
[2026-05-04 20:46:22] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / humaneval_plus -> /workspace/round3_out/round3/X/humaneval_plus
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.6477, 'grad_norm': 1.3868966102600098, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.6738, 'grad_norm': 1.604661464691162, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.4509, 'grad_norm': 0.7328211665153503, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.4074, 'grad_norm': 0.5431607961654663, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.3685, 'grad_norm': 0.5300050377845764, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 10.6898, 'train_samples_per_second': 74.838, 'train_steps_per_second': 9.355, 'train_loss': 1.7348880672454834, 'epoch': 1.0}
[2026-05-04 20:46:26] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / gsm_hard
[2026-05-04 20:46:26] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / arc_challenge -> /workspace/round3_out/round3/Y/arc_challenge
{'loss': 3.3627, 'grad_norm': 1.3451015949249268, 'learning_rate': 0.0002, 'epoch': 0.07692307692307693}
{'train_runtime': 1.557, 'train_samples_per_second': 64.228, 'train_steps_per_second': 8.35, 'train_loss': 2.8729841709136963, 'epoch': 1.0}
[2026-05-04 20:46:27] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mmlu_college_chemistry
{'loss': 1.956, 'grad_norm': 0.7723997831344604, 'learning_rate': 0.0001, 'epoch': 0.047619047619047616}
{'train_runtime': 4.1598, 'train_samples_per_second': 39.425, 'train_steps_per_second': 5.048, 'train_loss': 1.8341497636976696, 'epoch': 1.0}
[2026-05-04 20:46:29] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / humaneval_plus
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.3164, 'grad_norm': 3.0634682178497314, 'learning_rate': 0.0002, 'epoch': 0.07692307692307693}
{'train_runtime': 1.8981, 'train_samples_per_second': 52.684, 'train_steps_per_second': 6.849, 'train_loss': 2.554827488385714, 'epoch': 1.0}
[2026-05-04 20:46:34] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mmlu_college_chemistry
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.5894, 'grad_norm': 3.1051745414733887, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4591, 'grad_norm': 1.0540207624435425, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.2827, 'grad_norm': 0.4910883903503418, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.1604, 'grad_norm': 0.3621053099632263, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.1318, 'grad_norm': 0.3286433219909668, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.9461, 'train_samples_per_second': 61.794, 'train_steps_per_second': 7.724, 'train_loss': 1.5198212552070618, 'epoch': 1.0}
[2026-05-04 20:46:38] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / gsm_hard
[2026-05-04 20:46:38] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / arc_challenge -> /workspace/round3_out/round3/X/arc_challenge
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 1.618, 'grad_norm': 1.2775498628616333, 'learning_rate': 0.0001, 'epoch': 0.047619047619047616}
{'train_runtime': 3.9997, 'train_samples_per_second': 41.003, 'train_steps_per_second': 5.25, 'train_loss': 1.409454345703125, 'epoch': 1.0}
[2026-05-04 20:46:39] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / humaneval_plus
{'loss': 2.8625, 'grad_norm': 0.7109804749488831, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.2998, 'grad_norm': 0.7325422167778015, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.6904, 'grad_norm': 0.3608345091342926, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.6525, 'grad_norm': 0.2973387837409973, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.6613, 'grad_norm': 0.3263608515262604, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 24.4659, 'train_samples_per_second': 32.699, 'train_steps_per_second': 4.087, 'train_loss': 1.831635184288025, 'epoch': 1.0}
[2026-05-04 20:46:45] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / pubmedqa_pqal
[2026-05-04 20:46:45] [TRAIN_START] meta-llama/Llama-3.2-3B-Instruct / mbpp_plus -> /workspace/round3_out/round3/Y/mbpp_plus
{'loss': 3.5572, 'grad_norm': 1.3985272645950317, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.4469, 'grad_norm': 1.441445231437683, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.159, 'grad_norm': 0.49961817264556885, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.0744, 'grad_norm': 0.5765140652656555, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.1043, 'grad_norm': 0.46788185834884644, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 9.6401, 'train_samples_per_second': 82.986, 'train_steps_per_second': 10.373, 'train_loss': 1.4572549843788147, 'epoch': 1.0}
[2026-05-04 20:46:48] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / arc_challenge
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 2.5926, 'grad_norm': 1.2384510040283203, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1127, 'grad_norm': 0.45936235785484314, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.6208, 'grad_norm': 0.4088560938835144, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 1.5064, 'grad_norm': 0.22045007348060608, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 1.5147, 'grad_norm': 0.2344006896018982, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 23.9399, 'train_samples_per_second': 33.417, 'train_steps_per_second': 4.177, 'train_loss': 1.6934243583679198, 'epoch': 1.0}
[2026-05-04 20:46:49] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / pubmedqa_pqal
[2026-05-04 20:46:49] [TRAIN_START] Qwen/Qwen2.5-3B-Instruct / mbpp_plus -> /workspace/round3_out/round3/X/mbpp_plus
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
WARNING:accelerate.utils.other:Detected kernel version 4.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
{'loss': 3.5605, 'grad_norm': 3.3108983039855957, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 2.1969, 'grad_norm': 1.3170727491378784, 'learning_rate': 0.00017891405093963938, 'epoch': 0.25}
{'loss': 1.022, 'grad_norm': 0.3568710386753082, 'learning_rate': 0.00010825793454723325, 'epoch': 0.5}
{'loss': 0.8942, 'grad_norm': 0.35281091928482056, 'learning_rate': 3.227184283742591e-05, 'epoch': 0.75}
{'loss': 0.9196, 'grad_norm': 0.3574436902999878, 'learning_rate': 0.0, 'epoch': 1.0}
{'train_runtime': 12.0215, 'train_samples_per_second': 66.547, 'train_steps_per_second': 8.318, 'train_loss': 1.2718245792388916, 'epoch': 1.0}
[2026-05-04 20:47:02] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / arc_challenge
{'loss': 2.3857, 'grad_norm': 0.9954519867897034, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.020833333333333332}
{'loss': 1.7177, 'grad_norm': 0.8668803572654724, 'learning_rate': 0.00010348994967025012, 'epoch': 0.5208333333333334}
{'train_runtime': 7.9656, 'train_samples_per_second': 47.454, 'train_steps_per_second': 6.026, 'train_loss': 1.3500454326470692, 'epoch': 1.0}
[2026-05-04 20:47:05] [TRAIN_DONE] meta-llama/Llama-3.2-3B-Instruct / mbpp_plus
{'loss': 2.1308, 'grad_norm': 1.6519144773483276, 'learning_rate': 6.666666666666667e-05, 'epoch': 0.020833333333333332}
{'loss': 1.4398, 'grad_norm': 0.6850056052207947, 'learning_rate': 0.00010348994967025012, 'epoch': 0.5208333333333334}
{'train_runtime': 8.1327, 'train_samples_per_second': 46.479, 'train_steps_per_second': 5.902, 'train_loss': 1.171514943242073, 'epoch': 1.0}
[2026-05-04 20:47:08] [TRAIN_DONE] Qwen/Qwen2.5-3B-Instruct / mbpp_plus
[2026-05-04 20:47:10] [ANCHOR_MISSING] apps_introductory
[2026-05-04 20:47:10] [ANCHOR_MISSING] codecontests_easy
[2026-05-04 20:47:10] Available anchors: 22 counts={'math': 8, 'code': 6, 'science': 8}
[2026-05-04 20:47:10] [EXP1] Building/evaluating main mapping table
[2026-05-04 20:47:11] [EXP1_TASK] gsm_hard
/usr/local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/usr/local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(