------------> log file ==runs2/stsb/OUTPUT_ID/log_bs32_lr3e-05_20221124_035004_897265.txt Namespace(aug_train=False, data_dir='/home.local/jianwei/datasets/nlp/glue_data/STS-B', do_eval=False, early_stop=False, early_stop_metric='accuracy', eval_step=120, gradient_accumulation_steps=1, learning_rate=3e-05, local_rank=0, lr_scheduler_type=, max_length=128, max_train_steps=None, model_name_or_path='/home.local/jianwei/workspace/archive/SparseOptimizer/output/Layer_7_12_Hid_160_768_Head_10_12_IMRatio_3.5', num_train_epochs=30, num_warmup_steps=0, output_dir='runs2/stsb/OUTPUT_ID', pad_to_max_length=False, per_device_eval_batch_size=32, per_device_train_batch_size=32, print_step=5, save_last=False, seed=None, task_name='stsb', train_file=None, use_slow_tokenizer=False, validation_file=None, weight_decay=0.0) Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: fp16 Sample 4674 of the training set: (tensor([ 101, 10079, 3629, 3102, 2048, 12632, 2336, 102, 10079, 4894, 8563, 2340, 12632, 2336, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(3.)). Sample 112 of the training set: (tensor([ 101, 1037, 2879, 2003, 9361, 1037, 21854, 11563, 1012, 102, 1037, 2879, 2003, 2559, 2012, 1037, 8094, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(3.8000)). Sample 4529 of the training set: (tensor([ 101, 3725, 3844, 2015, 2091, 4264, 2004, 3586, 6240, 9446, 6561, 2605, 102, 7327, 7767, 1005, 4340, 1005, 2000, 10663, 3586, 4168, 4017, 9446, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), tensor(3.)). ***** Running training ***** Num examples = 5749 Num Epochs = 30 Instantaneous batch size per device = 32 Total train batch size (w. parallel, distributed & accumulation) = 32 Gradient Accumulation steps = 1 Total optimization steps = 5400 000005/005400, loss: 10.919802, avg_loss: 10.361012 000010/005400, loss: 10.181771, avg_loss: 9.698319 000015/005400, loss: 7.925676, avg_loss: 9.470798 000020/005400, loss: 9.417774, avg_loss: 9.406657 000025/005400, loss: 11.084503, avg_loss: 9.472598 000030/005400, loss: 7.033692, avg_loss: 9.447541 000035/005400, loss: 9.298050, avg_loss: 9.386552 000040/005400, loss: 8.342388, avg_loss: 9.386284 000045/005400, loss: 9.821406, avg_loss: 9.483932 000050/005400, loss: 9.257509, avg_loss: 9.586469 000055/005400, loss: 8.752683, avg_loss: 9.634466 000060/005400, loss: 6.560993, avg_loss: 9.559579 000065/005400, loss: 9.872775, avg_loss: 9.555094 000070/005400, loss: 9.549786, avg_loss: 9.570707 000075/005400, loss: 9.400767, avg_loss: 9.534591 000080/005400, loss: 9.152719, avg_loss: 9.532532 000085/005400, loss: 10.023327, avg_loss: 9.548860 000090/005400, loss: 8.150848, avg_loss: 9.539588 000095/005400, loss: 8.193304, avg_loss: 9.481100 000100/005400, loss: 8.688814, avg_loss: 9.431278 000105/005400, loss: 9.266927, avg_loss: 9.390692 000110/005400, loss: 7.621550, avg_loss: 9.346891 000115/005400, loss: 6.959364, avg_loss: 9.281569 000120/005400, loss: 9.679270, avg_loss: 9.291935 000125/005400, loss: 8.002371, avg_loss: 9.251488 000130/005400, loss: 8.983469, avg_loss: 9.210261 000135/005400, loss: 7.914767, avg_loss: 9.138828 000140/005400, loss: 9.368698, avg_loss: 9.096162 000145/005400, loss: 7.681985, avg_loss: 9.041475 000150/005400, loss: 7.530379, avg_loss: 8.976686 000155/005400, loss: 9.263411, avg_loss: 8.941676 000160/005400, loss: 7.710734, avg_loss: 8.895261 000165/005400, loss: 8.456438, avg_loss: 8.831314 000170/005400, loss: 6.155419, avg_loss: 8.772081 000175/005400, loss: 8.032525, avg_loss: 8.692823 000180/005400, loss: 4.489757, avg_loss: 8.613270 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 0, step 180/5400: {'pearson': 0.21495300918671972, 'spearmanr': 0.18778433070729544} 000185/005400, loss: 7.743507, avg_loss: 8.577872 000190/005400, loss: 6.030101, avg_loss: 8.510907 000195/005400, loss: 3.536020, avg_loss: 8.447573 000200/005400, loss: 6.482443, avg_loss: 8.383874 000205/005400, loss: 7.495704, avg_loss: 8.331849 000210/005400, loss: 7.830889, avg_loss: 8.285714 000215/005400, loss: 7.473868, avg_loss: 8.219836 000220/005400, loss: 6.685350, avg_loss: 8.164707 000225/005400, loss: 4.961877, avg_loss: 8.111880 000230/005400, loss: 5.369789, avg_loss: 8.067034 000235/005400, loss: 4.157079, avg_loss: 8.007677 000240/005400, loss: 6.113519, avg_loss: 7.971732 000245/005400, loss: 4.965279, avg_loss: 7.912654 000250/005400, loss: 3.810572, avg_loss: 7.868684 000255/005400, loss: 5.212838, avg_loss: 7.827637 000260/005400, loss: 5.060454, avg_loss: 7.780213 000265/005400, loss: 3.830095, avg_loss: 7.720051 000270/005400, loss: 5.186792, avg_loss: 7.666411 000275/005400, loss: 5.622235, avg_loss: 7.623420 000280/005400, loss: 4.717276, avg_loss: 7.579897 000285/005400, loss: 4.819950, avg_loss: 7.529627 000290/005400, loss: 5.464397, avg_loss: 7.489963 000295/005400, loss: 5.470286, avg_loss: 7.442070 000300/005400, loss: 3.843780, avg_loss: 7.396591 000305/005400, loss: 3.396843, avg_loss: 7.349362 000310/005400, loss: 4.573213, avg_loss: 7.293336 000315/005400, loss: 4.345067, avg_loss: 7.247148 000320/005400, loss: 4.538530, avg_loss: 7.205413 000325/005400, loss: 3.374168, avg_loss: 7.165096 000330/005400, loss: 3.680195, avg_loss: 7.118695 000335/005400, loss: 3.798603, avg_loss: 7.071580 000340/005400, loss: 4.418723, avg_loss: 7.028340 000345/005400, loss: 2.651713, avg_loss: 6.979162 000350/005400, loss: 4.138247, avg_loss: 6.931522 000355/005400, loss: 4.034257, avg_loss: 6.891976 000360/005400, loss: 3.947625, avg_loss: 6.853448 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 1, step 360/5400: {'pearson': 0.1693196142024497, 'spearmanr': 0.1520939753827761} 000365/005400, loss: 4.066084, avg_loss: 6.813658 000370/005400, loss: 2.446641, avg_loss: 6.765847 000375/005400, loss: 3.652923, avg_loss: 6.724291 000380/005400, loss: 2.925441, avg_loss: 6.683929 000385/005400, loss: 3.510277, avg_loss: 6.641427 000390/005400, loss: 3.712820, avg_loss: 6.597568 000395/005400, loss: 2.864999, avg_loss: 6.559540 000400/005400, loss: 2.363536, avg_loss: 6.515924 000405/005400, loss: 3.202157, avg_loss: 6.472551 000410/005400, loss: 2.507275, avg_loss: 6.427726 000415/005400, loss: 2.655454, avg_loss: 6.383974 000420/005400, loss: 3.361968, avg_loss: 6.343995 000425/005400, loss: 2.212350, avg_loss: 6.302114 000430/005400, loss: 2.654854, avg_loss: 6.260254 000435/005400, loss: 2.469006, avg_loss: 6.219799 000440/005400, loss: 2.423651, avg_loss: 6.179832 000445/005400, loss: 1.999993, avg_loss: 6.138898 000450/005400, loss: 3.104252, avg_loss: 6.101511 000455/005400, loss: 2.722913, avg_loss: 6.065750 000460/005400, loss: 2.014916, avg_loss: 6.027135 000465/005400, loss: 2.255650, avg_loss: 5.989807 000470/005400, loss: 2.582577, avg_loss: 5.953727 000475/005400, loss: 2.268125, avg_loss: 5.922364 000480/005400, loss: 2.132411, avg_loss: 5.883134 000485/005400, loss: 2.092988, avg_loss: 5.850251 000490/005400, loss: 2.469923, avg_loss: 5.816648 000495/005400, loss: 1.907046, avg_loss: 5.776493 000500/005400, loss: 2.188262, avg_loss: 5.740701 000505/005400, loss: 1.522663, avg_loss: 5.703157 000510/005400, loss: 1.982296, avg_loss: 5.667968 000515/005400, loss: 2.409446, avg_loss: 5.635783 000520/005400, loss: 1.887568, avg_loss: 5.603417 000525/005400, loss: 2.210217, avg_loss: 5.572377 000530/005400, loss: 2.381753, avg_loss: 5.541968 000535/005400, loss: 2.081358, avg_loss: 5.511043 000540/005400, loss: 2.770565, avg_loss: 5.483432 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 2, step 540/5400: {'pearson': 0.5585231671416229, 'spearmanr': 0.5968823171253705} 000545/005400, loss: 2.294569, avg_loss: 5.453700 000550/005400, loss: 1.879893, avg_loss: 5.425492 000555/005400, loss: 2.054521, avg_loss: 5.396323 000560/005400, loss: 2.426673, avg_loss: 5.367306 000565/005400, loss: 1.785937, avg_loss: 5.339570 000570/005400, loss: 2.125966, avg_loss: 5.312624 000575/005400, loss: 2.204447, avg_loss: 5.285309 000580/005400, loss: 1.977976, avg_loss: 5.257472 000585/005400, loss: 1.667451, avg_loss: 5.227795 000590/005400, loss: 2.013373, avg_loss: 5.196637 000595/005400, loss: 1.661575, avg_loss: 5.165030 000600/005400, loss: 1.761523, avg_loss: 5.134732 000605/005400, loss: 1.165827, avg_loss: 5.104335 000610/005400, loss: 1.423938, avg_loss: 5.073547 000615/005400, loss: 1.275937, avg_loss: 5.045039 000620/005400, loss: 1.456807, avg_loss: 5.016568 000625/005400, loss: 1.428447, avg_loss: 4.988611 000630/005400, loss: 1.340862, avg_loss: 4.960019 000635/005400, loss: 1.158772, avg_loss: 4.931690 000640/005400, loss: 1.279753, avg_loss: 4.903349 000645/005400, loss: 1.495990, avg_loss: 4.875799 000650/005400, loss: 1.418819, avg_loss: 4.847696 000655/005400, loss: 1.233781, avg_loss: 4.819313 000660/005400, loss: 0.825644, avg_loss: 4.790388 000665/005400, loss: 1.236975, avg_loss: 4.763828 000670/005400, loss: 1.427844, avg_loss: 4.737722 000675/005400, loss: 1.194959, avg_loss: 4.710323 000680/005400, loss: 1.298458, avg_loss: 4.683432 000685/005400, loss: 1.250220, avg_loss: 4.658527 000690/005400, loss: 1.528629, avg_loss: 4.632631 000695/005400, loss: 0.912524, avg_loss: 4.608270 000700/005400, loss: 0.927178, avg_loss: 4.583745 000705/005400, loss: 1.425212, avg_loss: 4.560665 000710/005400, loss: 1.385559, avg_loss: 4.537204 000715/005400, loss: 1.303016, avg_loss: 4.512879 000720/005400, loss: 1.559370, avg_loss: 4.489179 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 3, step 720/5400: {'pearson': 0.7538161883822286, 'spearmanr': 0.7339178388810693} 000725/005400, loss: 1.238668, avg_loss: 4.465238 000730/005400, loss: 0.980286, avg_loss: 4.441211 000735/005400, loss: 1.036360, avg_loss: 4.417511 000740/005400, loss: 0.733856, avg_loss: 4.392772 000745/005400, loss: 0.911818, avg_loss: 4.369401 000750/005400, loss: 0.923820, avg_loss: 4.346304 000755/005400, loss: 1.057140, avg_loss: 4.322930 000760/005400, loss: 0.671852, avg_loss: 4.300414 000765/005400, loss: 1.206062, avg_loss: 4.278674 000770/005400, loss: 1.131734, avg_loss: 4.256801 000775/005400, loss: 0.727703, avg_loss: 4.235108 000780/005400, loss: 0.807117, avg_loss: 4.213659 000785/005400, loss: 0.656183, avg_loss: 4.193212 000790/005400, loss: 1.307819, avg_loss: 4.173065 000795/005400, loss: 1.318512, avg_loss: 4.152132 000800/005400, loss: 0.923019, avg_loss: 4.130066 000805/005400, loss: 0.358175, avg_loss: 4.109530 000810/005400, loss: 0.568605, avg_loss: 4.090200 000815/005400, loss: 0.538159, avg_loss: 4.070320 000820/005400, loss: 0.791279, avg_loss: 4.050106 000825/005400, loss: 0.646954, avg_loss: 4.029779 000830/005400, loss: 0.696995, avg_loss: 4.010299 000835/005400, loss: 0.851315, avg_loss: 3.991444 000840/005400, loss: 0.953209, avg_loss: 3.972254 000845/005400, loss: 0.639867, avg_loss: 3.953383 000850/005400, loss: 0.828691, avg_loss: 3.934936 000855/005400, loss: 0.851312, avg_loss: 3.916689 000860/005400, loss: 0.913527, avg_loss: 3.898366 000865/005400, loss: 0.769578, avg_loss: 3.880448 000870/005400, loss: 0.780589, avg_loss: 3.862448 000875/005400, loss: 0.973308, avg_loss: 3.845167 000880/005400, loss: 0.829930, avg_loss: 3.827655 000885/005400, loss: 0.766140, avg_loss: 3.810577 000890/005400, loss: 0.563716, avg_loss: 3.793318 000895/005400, loss: 1.047082, avg_loss: 3.776392 000900/005400, loss: 0.866974, avg_loss: 3.759473 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 4, step 900/5400: {'pearson': 0.8037387020413668, 'spearmanr': 0.8107612065966875} 000905/005400, loss: 0.766670, avg_loss: 3.742716 000910/005400, loss: 0.644619, avg_loss: 3.725242 000915/005400, loss: 0.794441, avg_loss: 3.709127 000920/005400, loss: 0.583735, avg_loss: 3.693077 000925/005400, loss: 0.467945, avg_loss: 3.676897 000930/005400, loss: 0.635556, avg_loss: 3.661061 000935/005400, loss: 0.546880, avg_loss: 3.644815 000940/005400, loss: 0.442663, avg_loss: 3.628318 000945/005400, loss: 0.683668, avg_loss: 3.612901 000950/005400, loss: 0.656306, avg_loss: 3.597124 000955/005400, loss: 0.710459, avg_loss: 3.582143 000960/005400, loss: 0.503140, avg_loss: 3.567162 000965/005400, loss: 0.659339, avg_loss: 3.552504 000970/005400, loss: 0.707433, avg_loss: 3.537561 000975/005400, loss: 0.965483, avg_loss: 3.523352 000980/005400, loss: 0.855915, avg_loss: 3.508989 000985/005400, loss: 0.649465, avg_loss: 3.494453 000990/005400, loss: 0.513151, avg_loss: 3.480050 000995/005400, loss: 0.907288, avg_loss: 3.465789 001000/005400, loss: 0.461537, avg_loss: 3.451433 001005/005400, loss: 0.496157, avg_loss: 3.437648 001010/005400, loss: 0.989706, avg_loss: 3.424380 001015/005400, loss: 0.754088, avg_loss: 3.410539 001020/005400, loss: 0.731938, avg_loss: 3.396362 001025/005400, loss: 0.844449, avg_loss: 3.382560 001030/005400, loss: 0.346046, avg_loss: 3.368838 001035/005400, loss: 0.518788, avg_loss: 3.355767 001040/005400, loss: 0.714191, avg_loss: 3.342353 001045/005400, loss: 0.800863, avg_loss: 3.329591 001050/005400, loss: 0.538331, avg_loss: 3.316277 001055/005400, loss: 0.645015, avg_loss: 3.303465 001060/005400, loss: 0.451743, avg_loss: 3.290423 001065/005400, loss: 0.482815, avg_loss: 3.277498 001070/005400, loss: 0.428583, avg_loss: 3.264993 001075/005400, loss: 0.905002, avg_loss: 3.253403 001080/005400, loss: 0.423076, avg_loss: 3.241331 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 5, step 1080/5400: {'pearson': 0.8115941618503355, 'spearmanr': 0.8282434089896973} 001085/005400, loss: 0.671325, avg_loss: 3.228906 001090/005400, loss: 0.510681, avg_loss: 3.216880 001095/005400, loss: 0.503297, avg_loss: 3.204282 001100/005400, loss: 0.476207, avg_loss: 3.192340 001105/005400, loss: 0.287384, avg_loss: 3.180409 001110/005400, loss: 0.838371, avg_loss: 3.168479 001115/005400, loss: 0.561100, avg_loss: 3.157175 001120/005400, loss: 0.461640, avg_loss: 3.145364 001125/005400, loss: 0.672549, avg_loss: 3.133893 001130/005400, loss: 0.443830, avg_loss: 3.122226 001135/005400, loss: 0.465307, avg_loss: 3.110867 001140/005400, loss: 0.763562, avg_loss: 3.099965 001145/005400, loss: 0.561359, avg_loss: 3.088965 001150/005400, loss: 0.411171, avg_loss: 3.077866 001155/005400, loss: 0.406792, avg_loss: 3.066446 001160/005400, loss: 0.503313, avg_loss: 3.055675 001165/005400, loss: 0.475825, avg_loss: 3.045274 001170/005400, loss: 0.584800, avg_loss: 3.034442 001175/005400, loss: 0.465069, avg_loss: 3.023891 001180/005400, loss: 0.494697, avg_loss: 3.013498 001185/005400, loss: 0.544740, avg_loss: 3.003423 001190/005400, loss: 0.406965, avg_loss: 2.992763 001195/005400, loss: 0.268987, avg_loss: 2.982255 001200/005400, loss: 0.495571, avg_loss: 2.972160 001205/005400, loss: 0.538762, avg_loss: 2.961849 001210/005400, loss: 0.478300, avg_loss: 2.952011 001215/005400, loss: 0.338071, avg_loss: 2.942186 001220/005400, loss: 0.505288, avg_loss: 2.932113 001225/005400, loss: 0.570436, avg_loss: 2.922221 001230/005400, loss: 0.523959, avg_loss: 2.912277 001235/005400, loss: 0.491099, avg_loss: 2.902272 001240/005400, loss: 0.559447, avg_loss: 2.892549 001245/005400, loss: 0.753339, avg_loss: 2.883637 001250/005400, loss: 0.602193, avg_loss: 2.874184 001255/005400, loss: 0.302810, avg_loss: 2.864446 001260/005400, loss: 0.636528, avg_loss: 2.855367 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 6, step 1260/5400: {'pearson': 0.8244669741341696, 'spearmanr': 0.8347289521968146} 001265/005400, loss: 0.576356, avg_loss: 2.845884 001270/005400, loss: 0.356003, avg_loss: 2.836981 001275/005400, loss: 0.282959, avg_loss: 2.827679 001280/005400, loss: 0.471389, avg_loss: 2.818289 001285/005400, loss: 0.291599, avg_loss: 2.809166 001290/005400, loss: 0.309215, avg_loss: 2.799846 001295/005400, loss: 0.440720, avg_loss: 2.790764 001300/005400, loss: 0.452717, avg_loss: 2.781574 001305/005400, loss: 0.379403, avg_loss: 2.772831 001310/005400, loss: 0.740967, avg_loss: 2.764373 001315/005400, loss: 0.554469, avg_loss: 2.755583 001320/005400, loss: 0.422943, avg_loss: 2.747635 001325/005400, loss: 0.613703, avg_loss: 2.739164 001330/005400, loss: 0.333465, avg_loss: 2.730182 001335/005400, loss: 0.531835, avg_loss: 2.721662 001340/005400, loss: 0.447510, avg_loss: 2.713335 001345/005400, loss: 0.487799, avg_loss: 2.705467 001350/005400, loss: 0.629011, avg_loss: 2.697427 001355/005400, loss: 0.316717, avg_loss: 2.688931 001360/005400, loss: 0.483824, avg_loss: 2.680822 001365/005400, loss: 0.420798, avg_loss: 2.672428 001370/005400, loss: 0.312988, avg_loss: 2.664160 001375/005400, loss: 0.253772, avg_loss: 2.655796 001380/005400, loss: 0.507312, avg_loss: 2.648081 001385/005400, loss: 0.423927, avg_loss: 2.640514 001390/005400, loss: 0.488432, avg_loss: 2.632712 001395/005400, loss: 0.496802, avg_loss: 2.624703 001400/005400, loss: 0.411566, avg_loss: 2.617226 001405/005400, loss: 0.620914, avg_loss: 2.609520 001410/005400, loss: 0.529554, avg_loss: 2.602186 001415/005400, loss: 0.377586, avg_loss: 2.594550 001420/005400, loss: 0.537113, avg_loss: 2.587398 001425/005400, loss: 0.502925, avg_loss: 2.579730 001430/005400, loss: 0.501363, avg_loss: 2.572518 001435/005400, loss: 0.523148, avg_loss: 2.564881 001440/005400, loss: 0.283889, avg_loss: 2.557591 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 7, step 1440/5400: {'pearson': 0.8356315632016451, 'spearmanr': 0.8428067774651329} 001445/005400, loss: 0.283461, avg_loss: 2.549994 001450/005400, loss: 0.473319, avg_loss: 2.542806 001455/005400, loss: 0.465852, avg_loss: 2.535222 001460/005400, loss: 0.452470, avg_loss: 2.528097 001465/005400, loss: 0.528226, avg_loss: 2.521023 001470/005400, loss: 0.372980, avg_loss: 2.513948 001475/005400, loss: 0.580186, avg_loss: 2.507289 001480/005400, loss: 0.250609, avg_loss: 2.500083 001485/005400, loss: 0.373619, avg_loss: 2.492741 001490/005400, loss: 0.313954, avg_loss: 2.485812 001495/005400, loss: 0.421009, avg_loss: 2.478890 001500/005400, loss: 0.417312, avg_loss: 2.472097 001505/005400, loss: 0.419549, avg_loss: 2.465457 001510/005400, loss: 0.567841, avg_loss: 2.458859 001515/005400, loss: 0.221651, avg_loss: 2.452159 001520/005400, loss: 0.323677, avg_loss: 2.445824 001525/005400, loss: 0.563059, avg_loss: 2.439246 001530/005400, loss: 0.273469, avg_loss: 2.432506 001535/005400, loss: 0.230308, avg_loss: 2.425987 001540/005400, loss: 0.275917, avg_loss: 2.419360 001545/005400, loss: 0.490302, avg_loss: 2.412818 001550/005400, loss: 0.171527, avg_loss: 2.406091 001555/005400, loss: 0.499564, avg_loss: 2.399561 001560/005400, loss: 0.583477, avg_loss: 2.393275 001565/005400, loss: 0.422795, avg_loss: 2.387004 001570/005400, loss: 0.356273, avg_loss: 2.380570 001575/005400, loss: 0.442116, avg_loss: 2.374079 001580/005400, loss: 0.380964, avg_loss: 2.367966 001585/005400, loss: 0.454051, avg_loss: 2.361857 001590/005400, loss: 0.292075, avg_loss: 2.355417 001595/005400, loss: 0.433962, avg_loss: 2.349358 001600/005400, loss: 0.253748, avg_loss: 2.343178 001605/005400, loss: 0.277990, avg_loss: 2.337058 001610/005400, loss: 0.658840, avg_loss: 2.331389 001615/005400, loss: 0.284291, avg_loss: 2.325417 001620/005400, loss: 0.347131, avg_loss: 2.319557 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 8, step 1620/5400: {'pearson': 0.840875635131036, 'spearmanr': 0.8391187190190564} 001625/005400, loss: 0.468040, avg_loss: 2.313795 001630/005400, loss: 0.377569, avg_loss: 2.307781 001635/005400, loss: 0.373161, avg_loss: 2.301947 001640/005400, loss: 0.542144, avg_loss: 2.296237 001645/005400, loss: 0.394721, avg_loss: 2.290366 001650/005400, loss: 0.313285, avg_loss: 2.284578 001655/005400, loss: 0.458701, avg_loss: 2.278912 001660/005400, loss: 0.294037, avg_loss: 2.273092 001665/005400, loss: 0.288020, avg_loss: 2.267503 001670/005400, loss: 0.372206, avg_loss: 2.261890 001675/005400, loss: 0.439113, avg_loss: 2.256269 001680/005400, loss: 0.265594, avg_loss: 2.250567 001685/005400, loss: 0.307823, avg_loss: 2.244832 001690/005400, loss: 0.214900, avg_loss: 2.239233 001695/005400, loss: 0.430367, avg_loss: 2.234019 001700/005400, loss: 0.428587, avg_loss: 2.228347 001705/005400, loss: 0.466478, avg_loss: 2.223007 001710/005400, loss: 0.406999, avg_loss: 2.217425 001715/005400, loss: 0.249302, avg_loss: 2.211718 001720/005400, loss: 0.449824, avg_loss: 2.206581 001725/005400, loss: 0.200499, avg_loss: 2.201121 001730/005400, loss: 0.528394, avg_loss: 2.196022 001735/005400, loss: 0.420790, avg_loss: 2.190833 001740/005400, loss: 0.393591, avg_loss: 2.185567 001745/005400, loss: 0.292256, avg_loss: 2.180424 001750/005400, loss: 0.401385, avg_loss: 2.175266 001755/005400, loss: 0.294124, avg_loss: 2.169960 001760/005400, loss: 0.363119, avg_loss: 2.164699 001765/005400, loss: 0.390154, avg_loss: 2.159830 001770/005400, loss: 0.313013, avg_loss: 2.154815 001775/005400, loss: 0.308711, avg_loss: 2.149686 001780/005400, loss: 0.483320, avg_loss: 2.144812 001785/005400, loss: 0.379410, avg_loss: 2.139796 001790/005400, loss: 0.422236, avg_loss: 2.134915 001795/005400, loss: 0.511399, avg_loss: 2.130093 001800/005400, loss: 0.423039, avg_loss: 2.125146 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 9, step 1800/5400: {'pearson': 0.8342714757320445, 'spearmanr': 0.8376185602281018} 001805/005400, loss: 0.486487, avg_loss: 2.120132 001810/005400, loss: 0.270155, avg_loss: 2.115208 001815/005400, loss: 0.227492, avg_loss: 2.110093 001820/005400, loss: 0.346458, avg_loss: 2.105187 001825/005400, loss: 0.426929, avg_loss: 2.100322 001830/005400, loss: 0.117478, avg_loss: 2.095436 001835/005400, loss: 0.279193, avg_loss: 2.090488 001840/005400, loss: 0.387577, avg_loss: 2.085845 001845/005400, loss: 0.250648, avg_loss: 2.081071 001850/005400, loss: 0.303584, avg_loss: 2.076289 001855/005400, loss: 0.405041, avg_loss: 2.071732 001860/005400, loss: 0.166183, avg_loss: 2.066910 001865/005400, loss: 0.319343, avg_loss: 2.062141 001870/005400, loss: 0.317750, avg_loss: 2.057461 001875/005400, loss: 0.315497, avg_loss: 2.052864 001880/005400, loss: 0.338883, avg_loss: 2.048301 001885/005400, loss: 0.322422, avg_loss: 2.043658 001890/005400, loss: 0.136494, avg_loss: 2.038912 001895/005400, loss: 0.384212, avg_loss: 2.034422 001900/005400, loss: 0.386642, avg_loss: 2.029817 001905/005400, loss: 0.336843, avg_loss: 2.025262 001910/005400, loss: 0.378603, avg_loss: 2.020888 001915/005400, loss: 0.244922, avg_loss: 2.016456 001920/005400, loss: 0.388475, avg_loss: 2.012008 001925/005400, loss: 0.275199, avg_loss: 2.007573 001930/005400, loss: 0.286381, avg_loss: 2.003031 001935/005400, loss: 0.408020, avg_loss: 1.998613 001940/005400, loss: 0.296814, avg_loss: 1.994459 001945/005400, loss: 0.221215, avg_loss: 1.990418 001950/005400, loss: 0.386474, avg_loss: 1.986272 001955/005400, loss: 0.186999, avg_loss: 1.981974 001960/005400, loss: 0.353515, avg_loss: 1.977982 001965/005400, loss: 0.220710, avg_loss: 1.973756 001970/005400, loss: 0.522696, avg_loss: 1.969660 001975/005400, loss: 0.318528, avg_loss: 1.965668 001980/005400, loss: 0.256884, avg_loss: 1.961408 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 10, step 1980/5400: {'pearson': 0.8390370712384592, 'spearmanr': 0.8380421225427299} 001985/005400, loss: 0.339906, avg_loss: 1.957260 001990/005400, loss: 0.177573, avg_loss: 1.953061 001995/005400, loss: 0.434594, avg_loss: 1.949158 002000/005400, loss: 0.394058, avg_loss: 1.945127 002005/005400, loss: 0.284734, avg_loss: 1.941044 002010/005400, loss: 0.441842, avg_loss: 1.937158 002015/005400, loss: 0.370813, avg_loss: 1.933077 002020/005400, loss: 0.231465, avg_loss: 1.929090 002025/005400, loss: 0.401823, avg_loss: 1.925187 002030/005400, loss: 0.417580, avg_loss: 1.921148 002035/005400, loss: 0.233858, avg_loss: 1.917078 002040/005400, loss: 0.179666, avg_loss: 1.913157 002045/005400, loss: 0.260741, avg_loss: 1.909101 002050/005400, loss: 0.221551, avg_loss: 1.905037 002055/005400, loss: 0.234906, avg_loss: 1.901112 002060/005400, loss: 0.170529, avg_loss: 1.897019 002065/005400, loss: 0.246520, avg_loss: 1.893189 002070/005400, loss: 0.221311, avg_loss: 1.889234 002075/005400, loss: 0.181704, avg_loss: 1.885389 002080/005400, loss: 0.418144, avg_loss: 1.881511 002085/005400, loss: 0.207121, avg_loss: 1.877616 002090/005400, loss: 0.250038, avg_loss: 1.873798 002095/005400, loss: 0.266151, avg_loss: 1.869941 002100/005400, loss: 0.329553, avg_loss: 1.866257 002105/005400, loss: 0.316394, avg_loss: 1.862574 002110/005400, loss: 0.202054, avg_loss: 1.858893 002115/005400, loss: 0.558679, avg_loss: 1.855374 002120/005400, loss: 0.305135, avg_loss: 1.851792 002125/005400, loss: 0.306204, avg_loss: 1.848025 002130/005400, loss: 0.354196, avg_loss: 1.844382 002135/005400, loss: 0.513295, avg_loss: 1.840886 002140/005400, loss: 0.338046, avg_loss: 1.837288 002145/005400, loss: 0.233815, avg_loss: 1.833621 002150/005400, loss: 0.303081, avg_loss: 1.830035 002155/005400, loss: 0.217688, avg_loss: 1.826318 002160/005400, loss: 0.223059, avg_loss: 1.822730 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 11, step 2160/5400: {'pearson': 0.8434982902424131, 'spearmanr': 0.8445651086908786} 002165/005400, loss: 0.237432, avg_loss: 1.819061 002170/005400, loss: 0.283776, avg_loss: 1.815507 002175/005400, loss: 0.309928, avg_loss: 1.811960 002180/005400, loss: 0.256525, avg_loss: 1.808401 002185/005400, loss: 0.282268, avg_loss: 1.804922 002190/005400, loss: 0.277528, avg_loss: 1.801368 002195/005400, loss: 0.345856, avg_loss: 1.797885 002200/005400, loss: 0.393328, avg_loss: 1.794652 002205/005400, loss: 0.224377, avg_loss: 1.791248 002210/005400, loss: 0.219291, avg_loss: 1.787713 002215/005400, loss: 0.147671, avg_loss: 1.784197 002220/005400, loss: 0.339344, avg_loss: 1.780853 002225/005400, loss: 0.219361, avg_loss: 1.777467 002230/005400, loss: 0.280020, avg_loss: 1.774036 002235/005400, loss: 0.261592, avg_loss: 1.770745 002240/005400, loss: 0.293255, avg_loss: 1.767543 002245/005400, loss: 0.260899, avg_loss: 1.764155 002250/005400, loss: 0.251379, avg_loss: 1.760734 002255/005400, loss: 0.180517, avg_loss: 1.757394 002260/005400, loss: 0.237342, avg_loss: 1.754018 002265/005400, loss: 0.348091, avg_loss: 1.750775 002270/005400, loss: 0.169205, avg_loss: 1.747420 002275/005400, loss: 0.308270, avg_loss: 1.744165 002280/005400, loss: 0.265926, avg_loss: 1.740912 002285/005400, loss: 0.269741, avg_loss: 1.737594 002290/005400, loss: 0.368088, avg_loss: 1.734481 002295/005400, loss: 0.288817, avg_loss: 1.731510 002300/005400, loss: 0.151223, avg_loss: 1.728326 002305/005400, loss: 0.314602, avg_loss: 1.725295 002310/005400, loss: 0.204679, avg_loss: 1.722112 002315/005400, loss: 0.288287, avg_loss: 1.718930 002320/005400, loss: 0.245926, avg_loss: 1.715852 002325/005400, loss: 0.204663, avg_loss: 1.712662 002330/005400, loss: 0.215070, avg_loss: 1.709556 002335/005400, loss: 0.190882, avg_loss: 1.706442 002340/005400, loss: 0.224660, avg_loss: 1.703429 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 12, step 2340/5400: {'pearson': 0.8415414818553372, 'spearmanr': 0.8425621296013649} 002345/005400, loss: 0.207369, avg_loss: 1.700278 002350/005400, loss: 0.261497, avg_loss: 1.697250 002355/005400, loss: 0.230280, avg_loss: 1.694103 002360/005400, loss: 0.262285, avg_loss: 1.690920 002365/005400, loss: 0.151266, avg_loss: 1.687904 002370/005400, loss: 0.269719, avg_loss: 1.684892 002375/005400, loss: 0.354083, avg_loss: 1.681934 002380/005400, loss: 0.237291, avg_loss: 1.678996 002385/005400, loss: 0.186130, avg_loss: 1.676010 002390/005400, loss: 0.260663, avg_loss: 1.673000 002395/005400, loss: 0.203245, avg_loss: 1.669989 002400/005400, loss: 0.309466, avg_loss: 1.667078 002405/005400, loss: 0.167727, avg_loss: 1.664065 002410/005400, loss: 0.180444, avg_loss: 1.661110 002415/005400, loss: 0.205075, avg_loss: 1.658129 002420/005400, loss: 0.251971, avg_loss: 1.655157 002425/005400, loss: 0.503691, avg_loss: 1.652340 002430/005400, loss: 0.361796, avg_loss: 1.649719 002435/005400, loss: 0.220655, avg_loss: 1.646866 002440/005400, loss: 0.364590, avg_loss: 1.644123 002445/005400, loss: 0.387156, avg_loss: 1.641263 002450/005400, loss: 0.321079, avg_loss: 1.638517 002455/005400, loss: 0.165761, avg_loss: 1.635770 002460/005400, loss: 0.270390, avg_loss: 1.632963 002465/005400, loss: 0.202102, avg_loss: 1.630213 002470/005400, loss: 0.162662, avg_loss: 1.627334 002475/005400, loss: 0.141903, avg_loss: 1.624407 002480/005400, loss: 0.130965, avg_loss: 1.621656 002485/005400, loss: 0.185001, avg_loss: 1.618813 002490/005400, loss: 0.237992, avg_loss: 1.616033 002495/005400, loss: 0.158510, avg_loss: 1.613217 002500/005400, loss: 0.259753, avg_loss: 1.610477 002505/005400, loss: 0.108687, avg_loss: 1.607702 002510/005400, loss: 0.179495, avg_loss: 1.604972 002515/005400, loss: 0.267883, avg_loss: 1.602195 002520/005400, loss: 0.205575, avg_loss: 1.599474 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 13, step 2520/5400: {'pearson': 0.8425599117367437, 'spearmanr': 0.8414850205786223} 002525/005400, loss: 0.199653, avg_loss: 1.596711 002530/005400, loss: 0.201341, avg_loss: 1.593993 002535/005400, loss: 0.203724, avg_loss: 1.591213 002540/005400, loss: 0.254623, avg_loss: 1.588562 002545/005400, loss: 0.369073, avg_loss: 1.585980 002550/005400, loss: 0.106891, avg_loss: 1.583355 002555/005400, loss: 0.136818, avg_loss: 1.580702 002560/005400, loss: 0.231878, avg_loss: 1.577973 002565/005400, loss: 0.156474, avg_loss: 1.575269 002570/005400, loss: 0.236511, avg_loss: 1.572622 002575/005400, loss: 0.257811, avg_loss: 1.570007 002580/005400, loss: 0.468576, avg_loss: 1.567428 002585/005400, loss: 0.163139, avg_loss: 1.564778 002590/005400, loss: 0.436930, avg_loss: 1.562216 002595/005400, loss: 0.196596, avg_loss: 1.559604 002600/005400, loss: 0.232763, avg_loss: 1.557100 002605/005400, loss: 0.164102, avg_loss: 1.554545 002610/005400, loss: 0.258984, avg_loss: 1.551967 002615/005400, loss: 0.188581, avg_loss: 1.549408 002620/005400, loss: 0.215384, avg_loss: 1.546768 002625/005400, loss: 0.165978, avg_loss: 1.544174 002630/005400, loss: 0.254275, avg_loss: 1.541621 002635/005400, loss: 0.260447, avg_loss: 1.539074 002640/005400, loss: 0.257019, avg_loss: 1.536569 002645/005400, loss: 0.304152, avg_loss: 1.534171 002650/005400, loss: 0.172311, avg_loss: 1.531694 002655/005400, loss: 0.217652, avg_loss: 1.529228 002660/005400, loss: 0.431580, avg_loss: 1.526855 002665/005400, loss: 0.342930, avg_loss: 1.524416 002670/005400, loss: 0.281481, avg_loss: 1.521985 002675/005400, loss: 0.115055, avg_loss: 1.519483 002680/005400, loss: 0.190243, avg_loss: 1.517189 002685/005400, loss: 0.173296, avg_loss: 1.514757 002690/005400, loss: 0.374071, avg_loss: 1.512512 002695/005400, loss: 0.322947, avg_loss: 1.510223 002700/005400, loss: 0.133452, avg_loss: 1.507823 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 14, step 2700/5400: {'pearson': 0.8428262938537643, 'spearmanr': 0.8418967117492774} 002705/005400, loss: 0.165615, avg_loss: 1.505401 002710/005400, loss: 0.191277, avg_loss: 1.503067 002715/005400, loss: 0.186724, avg_loss: 1.500670 002720/005400, loss: 0.166687, avg_loss: 1.498308 002725/005400, loss: 0.173368, avg_loss: 1.495943 002730/005400, loss: 0.182292, avg_loss: 1.493571 002735/005400, loss: 0.094817, avg_loss: 1.491132 002740/005400, loss: 0.151966, avg_loss: 1.488704 002745/005400, loss: 0.118933, avg_loss: 1.486331 002750/005400, loss: 0.150439, avg_loss: 1.484025 002755/005400, loss: 0.220458, avg_loss: 1.481835 002760/005400, loss: 0.165892, avg_loss: 1.479519 002765/005400, loss: 0.226839, avg_loss: 1.477155 002770/005400, loss: 0.181736, avg_loss: 1.474844 002775/005400, loss: 0.103294, avg_loss: 1.472493 002780/005400, loss: 0.152098, avg_loss: 1.470169 002785/005400, loss: 0.210727, avg_loss: 1.467948 002790/005400, loss: 0.218008, avg_loss: 1.465678 002795/005400, loss: 0.303881, avg_loss: 1.463492 002800/005400, loss: 0.149363, avg_loss: 1.461267 002805/005400, loss: 0.278521, avg_loss: 1.459031 002810/005400, loss: 0.177459, avg_loss: 1.456765 002815/005400, loss: 0.147072, avg_loss: 1.454549 002820/005400, loss: 0.154193, avg_loss: 1.452240 002825/005400, loss: 0.118995, avg_loss: 1.450022 002830/005400, loss: 0.306946, avg_loss: 1.447801 002835/005400, loss: 0.203090, avg_loss: 1.445593 002840/005400, loss: 0.196348, avg_loss: 1.443464 002845/005400, loss: 0.113525, avg_loss: 1.441222 002850/005400, loss: 0.305031, avg_loss: 1.439138 002855/005400, loss: 0.179518, avg_loss: 1.436929 002860/005400, loss: 0.317867, avg_loss: 1.434791 002865/005400, loss: 0.244391, avg_loss: 1.432654 002870/005400, loss: 0.201873, avg_loss: 1.430598 002875/005400, loss: 0.332513, avg_loss: 1.428486 002880/005400, loss: 0.174545, avg_loss: 1.426279 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 15, step 2880/5400: {'pearson': 0.8465462185651544, 'spearmanr': 0.8451574856196069} 002885/005400, loss: 0.140292, avg_loss: 1.424148 002890/005400, loss: 0.180590, avg_loss: 1.422041 002895/005400, loss: 0.276235, avg_loss: 1.419968 002900/005400, loss: 0.079708, avg_loss: 1.417818 002905/005400, loss: 0.178860, avg_loss: 1.415680 002910/005400, loss: 0.191974, avg_loss: 1.413542 002915/005400, loss: 0.160231, avg_loss: 1.411524 002920/005400, loss: 0.179065, avg_loss: 1.409382 002925/005400, loss: 0.261529, avg_loss: 1.407299 002930/005400, loss: 0.196875, avg_loss: 1.405278 002935/005400, loss: 0.172792, avg_loss: 1.403193 002940/005400, loss: 0.132129, avg_loss: 1.401091 002945/005400, loss: 0.143233, avg_loss: 1.398991 002950/005400, loss: 0.098005, avg_loss: 1.396972 002955/005400, loss: 0.216378, avg_loss: 1.394936 002960/005400, loss: 0.168641, avg_loss: 1.392847 002965/005400, loss: 0.200968, avg_loss: 1.390786 002970/005400, loss: 0.125896, avg_loss: 1.388788 002975/005400, loss: 0.244486, avg_loss: 1.386788 002980/005400, loss: 0.157024, avg_loss: 1.384753 002985/005400, loss: 0.131733, avg_loss: 1.382739 002990/005400, loss: 0.180723, avg_loss: 1.380701 002995/005400, loss: 0.213533, avg_loss: 1.378717 003000/005400, loss: 0.149431, avg_loss: 1.376713 003005/005400, loss: 0.145573, avg_loss: 1.374738 003010/005400, loss: 0.142425, avg_loss: 1.372738 003015/005400, loss: 0.273710, avg_loss: 1.370737 003020/005400, loss: 0.164532, avg_loss: 1.368793 003025/005400, loss: 0.354658, avg_loss: 1.366944 003030/005400, loss: 0.162812, avg_loss: 1.365036 003035/005400, loss: 0.225085, avg_loss: 1.363150 003040/005400, loss: 0.237793, avg_loss: 1.361249 003045/005400, loss: 0.175477, avg_loss: 1.359304 003050/005400, loss: 0.220884, avg_loss: 1.357379 003055/005400, loss: 0.116397, avg_loss: 1.355440 003060/005400, loss: 0.180262, avg_loss: 1.353549 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 16, step 3060/5400: {'pearson': 0.8475945534372652, 'spearmanr': 0.8462737598699491} 003065/005400, loss: 0.208348, avg_loss: 1.351671 003070/005400, loss: 0.162787, avg_loss: 1.349760 003075/005400, loss: 0.204459, avg_loss: 1.347873 003080/005400, loss: 0.243172, avg_loss: 1.346001 003085/005400, loss: 0.105318, avg_loss: 1.344059 003090/005400, loss: 0.143131, avg_loss: 1.342206 003095/005400, loss: 0.170449, avg_loss: 1.340305 003100/005400, loss: 0.208828, avg_loss: 1.338421 003105/005400, loss: 0.186506, avg_loss: 1.336552 003110/005400, loss: 0.138573, avg_loss: 1.334692 003115/005400, loss: 0.199446, avg_loss: 1.332886 003120/005400, loss: 0.178179, avg_loss: 1.331061 003125/005400, loss: 0.158329, avg_loss: 1.329155 003130/005400, loss: 0.132958, avg_loss: 1.327291 003135/005400, loss: 0.117738, avg_loss: 1.325437 003140/005400, loss: 0.187024, avg_loss: 1.323590 003145/005400, loss: 0.285563, avg_loss: 1.321791 003150/005400, loss: 0.126655, avg_loss: 1.320009 003155/005400, loss: 0.246144, avg_loss: 1.318180 003160/005400, loss: 0.222086, avg_loss: 1.316403 003165/005400, loss: 0.088263, avg_loss: 1.314602 003170/005400, loss: 0.159250, avg_loss: 1.312750 003175/005400, loss: 0.232737, avg_loss: 1.311048 003180/005400, loss: 0.150258, avg_loss: 1.309249 003185/005400, loss: 0.149525, avg_loss: 1.307465 003190/005400, loss: 0.175701, avg_loss: 1.305661 003195/005400, loss: 0.224868, avg_loss: 1.303942 003200/005400, loss: 0.151383, avg_loss: 1.302172 003205/005400, loss: 0.216179, avg_loss: 1.300442 003210/005400, loss: 0.197382, avg_loss: 1.298647 003215/005400, loss: 0.174374, avg_loss: 1.296861 003220/005400, loss: 0.146824, avg_loss: 1.295138 003225/005400, loss: 0.172476, avg_loss: 1.293391 003230/005400, loss: 0.180328, avg_loss: 1.291636 003235/005400, loss: 0.219937, avg_loss: 1.289898 003240/005400, loss: 0.152960, avg_loss: 1.288163 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 17, step 3240/5400: {'pearson': 0.8504202206275068, 'spearmanr': 0.8473922892792047} 003245/005400, loss: 0.171524, avg_loss: 1.286479 003250/005400, loss: 0.116338, avg_loss: 1.284752 003255/005400, loss: 0.086406, avg_loss: 1.283072 003260/005400, loss: 0.150628, avg_loss: 1.281353 003265/005400, loss: 0.139414, avg_loss: 1.279617 003270/005400, loss: 0.193610, avg_loss: 1.277939 003275/005400, loss: 0.235554, avg_loss: 1.276249 003280/005400, loss: 0.166258, avg_loss: 1.274573 003285/005400, loss: 0.263752, avg_loss: 1.272963 003290/005400, loss: 0.303736, avg_loss: 1.271314 003295/005400, loss: 0.119213, avg_loss: 1.269609 003300/005400, loss: 0.132104, avg_loss: 1.267901 003305/005400, loss: 0.143845, avg_loss: 1.266212 003310/005400, loss: 0.115098, avg_loss: 1.264532 003315/005400, loss: 0.288430, avg_loss: 1.262901 003320/005400, loss: 0.173986, avg_loss: 1.261220 003325/005400, loss: 0.120085, avg_loss: 1.259552 003330/005400, loss: 0.248743, avg_loss: 1.257920 003335/005400, loss: 0.139627, avg_loss: 1.256220 003340/005400, loss: 0.147467, avg_loss: 1.254561 003345/005400, loss: 0.142301, avg_loss: 1.252920 003350/005400, loss: 0.156088, avg_loss: 1.251271 003355/005400, loss: 0.151669, avg_loss: 1.249613 003360/005400, loss: 0.214872, avg_loss: 1.248012 003365/005400, loss: 0.198525, avg_loss: 1.246435 003370/005400, loss: 0.088710, avg_loss: 1.244759 003375/005400, loss: 0.120682, avg_loss: 1.243157 003380/005400, loss: 0.180583, avg_loss: 1.241588 003385/005400, loss: 0.228067, avg_loss: 1.240034 003390/005400, loss: 0.126767, avg_loss: 1.238442 003395/005400, loss: 0.125910, avg_loss: 1.236902 003400/005400, loss: 0.139716, avg_loss: 1.235308 003405/005400, loss: 0.080612, avg_loss: 1.233692 003410/005400, loss: 0.212925, avg_loss: 1.232123 003415/005400, loss: 0.131897, avg_loss: 1.230545 003420/005400, loss: 0.205202, avg_loss: 1.228983 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 18, step 3420/5400: {'pearson': 0.8498450703665391, 'spearmanr': 0.8479951774929629} 003425/005400, loss: 0.122507, avg_loss: 1.227378 003430/005400, loss: 0.250203, avg_loss: 1.225830 003435/005400, loss: 0.173522, avg_loss: 1.224317 003440/005400, loss: 0.087732, avg_loss: 1.222770 003445/005400, loss: 0.154733, avg_loss: 1.221229 003450/005400, loss: 0.217941, avg_loss: 1.219678 003455/005400, loss: 0.137303, avg_loss: 1.218170 003460/005400, loss: 0.112234, avg_loss: 1.216591 003465/005400, loss: 0.150905, avg_loss: 1.215047 003470/005400, loss: 0.158825, avg_loss: 1.213517 003475/005400, loss: 0.173023, avg_loss: 1.212032 003480/005400, loss: 0.178021, avg_loss: 1.210536 003485/005400, loss: 0.247019, avg_loss: 1.209080 003490/005400, loss: 0.072551, avg_loss: 1.207569 003495/005400, loss: 0.162839, avg_loss: 1.206022 003500/005400, loss: 0.189042, avg_loss: 1.204516 003505/005400, loss: 0.173782, avg_loss: 1.203007 003510/005400, loss: 0.138777, avg_loss: 1.201515 003515/005400, loss: 0.177656, avg_loss: 1.200013 003520/005400, loss: 0.103750, avg_loss: 1.198508 003525/005400, loss: 0.169574, avg_loss: 1.197020 003530/005400, loss: 0.119396, avg_loss: 1.195545 003535/005400, loss: 0.264826, avg_loss: 1.194100 003540/005400, loss: 0.098011, avg_loss: 1.192637 003545/005400, loss: 0.088810, avg_loss: 1.191125 003550/005400, loss: 0.107876, avg_loss: 1.189654 003555/005400, loss: 0.157520, avg_loss: 1.188192 003560/005400, loss: 0.176217, avg_loss: 1.186812 003565/005400, loss: 0.111337, avg_loss: 1.185342 003570/005400, loss: 0.166201, avg_loss: 1.183889 003575/005400, loss: 0.171814, avg_loss: 1.182409 003580/005400, loss: 0.112979, avg_loss: 1.181004 003585/005400, loss: 0.119157, avg_loss: 1.179598 003590/005400, loss: 0.114437, avg_loss: 1.178189 003595/005400, loss: 0.155447, avg_loss: 1.176771 003600/005400, loss: 0.157078, avg_loss: 1.175317 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 19, step 3600/5400: {'pearson': 0.8482436057295935, 'spearmanr': 0.8472426908693901} 003605/005400, loss: 0.154441, avg_loss: 1.173877 003610/005400, loss: 0.100947, avg_loss: 1.172478 003615/005400, loss: 0.125365, avg_loss: 1.171029 003620/005400, loss: 0.106434, avg_loss: 1.169605 003625/005400, loss: 0.130245, avg_loss: 1.168211 003630/005400, loss: 0.134600, avg_loss: 1.166787 003635/005400, loss: 0.266648, avg_loss: 1.165400 003640/005400, loss: 0.144939, avg_loss: 1.164021 003645/005400, loss: 0.106222, avg_loss: 1.162577 003650/005400, loss: 0.117357, avg_loss: 1.161193 003655/005400, loss: 0.202359, avg_loss: 1.159805 003660/005400, loss: 0.166776, avg_loss: 1.158439 003665/005400, loss: 0.107025, avg_loss: 1.157045 003670/005400, loss: 0.143284, avg_loss: 1.155661 003675/005400, loss: 0.198224, avg_loss: 1.154297 003680/005400, loss: 0.280506, avg_loss: 1.152964 003685/005400, loss: 0.130698, avg_loss: 1.151564 003690/005400, loss: 0.129304, avg_loss: 1.150198 003695/005400, loss: 0.137243, avg_loss: 1.148803 003700/005400, loss: 0.097097, avg_loss: 1.147449 003705/005400, loss: 0.144787, avg_loss: 1.146119 003710/005400, loss: 0.127824, avg_loss: 1.144796 003715/005400, loss: 0.176846, avg_loss: 1.143457 003720/005400, loss: 0.100565, avg_loss: 1.142128 003725/005400, loss: 0.080043, avg_loss: 1.140760 003730/005400, loss: 0.125706, avg_loss: 1.139474 003735/005400, loss: 0.117341, avg_loss: 1.138159 003740/005400, loss: 0.158067, avg_loss: 1.136843 003745/005400, loss: 0.151995, avg_loss: 1.135553 003750/005400, loss: 0.277281, avg_loss: 1.134297 003755/005400, loss: 0.133230, avg_loss: 1.132962 003760/005400, loss: 0.186799, avg_loss: 1.131718 003765/005400, loss: 0.205163, avg_loss: 1.130425 003770/005400, loss: 0.157280, avg_loss: 1.129118 003775/005400, loss: 0.250720, avg_loss: 1.127838 003780/005400, loss: 0.138770, avg_loss: 1.126563 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 20, step 3780/5400: {'pearson': 0.8516633883376111, 'spearmanr': 0.848796837026541} 003785/005400, loss: 0.280053, avg_loss: 1.125306 003790/005400, loss: 0.119360, avg_loss: 1.124036 003795/005400, loss: 0.150453, avg_loss: 1.122750 003800/005400, loss: 0.150021, avg_loss: 1.121459 003805/005400, loss: 0.077378, avg_loss: 1.120170 003810/005400, loss: 0.148403, avg_loss: 1.118906 003815/005400, loss: 0.178699, avg_loss: 1.117645 003820/005400, loss: 0.149582, avg_loss: 1.116330 003825/005400, loss: 0.128546, avg_loss: 1.115013 003830/005400, loss: 0.268229, avg_loss: 1.113776 003835/005400, loss: 0.195517, avg_loss: 1.112531 003840/005400, loss: 0.208493, avg_loss: 1.111268 003845/005400, loss: 0.193140, avg_loss: 1.110029 003850/005400, loss: 0.088294, avg_loss: 1.108804 003855/005400, loss: 0.149382, avg_loss: 1.107547 003860/005400, loss: 0.198664, avg_loss: 1.106283 003865/005400, loss: 0.126898, avg_loss: 1.104976 003870/005400, loss: 0.129632, avg_loss: 1.103741 003875/005400, loss: 0.123535, avg_loss: 1.102518 003880/005400, loss: 0.165960, avg_loss: 1.101254 003885/005400, loss: 0.138942, avg_loss: 1.100020 003890/005400, loss: 0.128230, avg_loss: 1.098769 003895/005400, loss: 0.104971, avg_loss: 1.097568 003900/005400, loss: 0.085618, avg_loss: 1.096347 003905/005400, loss: 0.126211, avg_loss: 1.095101 003910/005400, loss: 0.172208, avg_loss: 1.093876 003915/005400, loss: 0.134293, avg_loss: 1.092629 003920/005400, loss: 0.130413, avg_loss: 1.091402 003925/005400, loss: 0.126139, avg_loss: 1.090250 003930/005400, loss: 0.133957, avg_loss: 1.089023 003935/005400, loss: 0.235973, avg_loss: 1.087812 003940/005400, loss: 0.145638, avg_loss: 1.086616 003945/005400, loss: 0.101992, avg_loss: 1.085413 003950/005400, loss: 0.126402, avg_loss: 1.084220 003955/005400, loss: 0.117492, avg_loss: 1.083047 003960/005400, loss: 0.130239, avg_loss: 1.081839 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 21, step 3960/5400: {'pearson': 0.847780716202824, 'spearmanr': 0.8469865580881132} 003965/005400, loss: 0.064479, avg_loss: 1.080611 003970/005400, loss: 0.171006, avg_loss: 1.079406 003975/005400, loss: 0.085861, avg_loss: 1.078165 003980/005400, loss: 0.095522, avg_loss: 1.076947 003985/005400, loss: 0.130852, avg_loss: 1.075774 003990/005400, loss: 0.134866, avg_loss: 1.074596 003995/005400, loss: 0.074542, avg_loss: 1.073402 004000/005400, loss: 0.116856, avg_loss: 1.072228 004005/005400, loss: 0.105077, avg_loss: 1.071026 004010/005400, loss: 0.125664, avg_loss: 1.069845 004015/005400, loss: 0.103024, avg_loss: 1.068720 004020/005400, loss: 0.128571, avg_loss: 1.067519 004025/005400, loss: 0.112454, avg_loss: 1.066343 004030/005400, loss: 0.150399, avg_loss: 1.065202 004035/005400, loss: 0.073474, avg_loss: 1.064020 004040/005400, loss: 0.209692, avg_loss: 1.062895 004045/005400, loss: 0.126410, avg_loss: 1.061723 004050/005400, loss: 0.168801, avg_loss: 1.060549 004055/005400, loss: 0.159003, avg_loss: 1.059423 004060/005400, loss: 0.153256, avg_loss: 1.058304 004065/005400, loss: 0.139507, avg_loss: 1.057153 004070/005400, loss: 0.196264, avg_loss: 1.056009 004075/005400, loss: 0.153815, avg_loss: 1.054860 004080/005400, loss: 0.182606, avg_loss: 1.053734 004085/005400, loss: 0.093651, avg_loss: 1.052585 004090/005400, loss: 0.138306, avg_loss: 1.051489 004095/005400, loss: 0.125193, avg_loss: 1.050385 004100/005400, loss: 0.086516, avg_loss: 1.049260 004105/005400, loss: 0.120107, avg_loss: 1.048157 004110/005400, loss: 0.246864, avg_loss: 1.047057 004115/005400, loss: 0.120596, avg_loss: 1.045902 004120/005400, loss: 0.121840, avg_loss: 1.044833 004125/005400, loss: 0.141377, avg_loss: 1.043755 004130/005400, loss: 0.130236, avg_loss: 1.042661 004135/005400, loss: 0.077593, avg_loss: 1.041535 004140/005400, loss: 0.096709, avg_loss: 1.040430 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 22, step 4140/5400: {'pearson': 0.8516815294443599, 'spearmanr': 0.8481674736867748} 004145/005400, loss: 0.107884, avg_loss: 1.039325 004150/005400, loss: 0.163758, avg_loss: 1.038213 004155/005400, loss: 0.107559, avg_loss: 1.037071 004160/005400, loss: 0.221394, avg_loss: 1.035994 004165/005400, loss: 0.095282, avg_loss: 1.034890 004170/005400, loss: 0.115735, avg_loss: 1.033791 004175/005400, loss: 0.120850, avg_loss: 1.032702 004180/005400, loss: 0.148173, avg_loss: 1.031638 004185/005400, loss: 0.150222, avg_loss: 1.030549 004190/005400, loss: 0.167705, avg_loss: 1.029474 004195/005400, loss: 0.080327, avg_loss: 1.028393 004200/005400, loss: 0.163523, avg_loss: 1.027316 004205/005400, loss: 0.091747, avg_loss: 1.026259 004210/005400, loss: 0.146581, avg_loss: 1.025186 004215/005400, loss: 0.138113, avg_loss: 1.024116 004220/005400, loss: 0.126675, avg_loss: 1.023037 004225/005400, loss: 0.174576, avg_loss: 1.021997 004230/005400, loss: 0.192664, avg_loss: 1.020943 004235/005400, loss: 0.075478, avg_loss: 1.019836 004240/005400, loss: 0.152823, avg_loss: 1.018803 004245/005400, loss: 0.116004, avg_loss: 1.017760 004250/005400, loss: 0.151843, avg_loss: 1.016687 004255/005400, loss: 0.198972, avg_loss: 1.015644 004260/005400, loss: 0.158850, avg_loss: 1.014584 004265/005400, loss: 0.140898, avg_loss: 1.013570 004270/005400, loss: 0.102441, avg_loss: 1.012552 004275/005400, loss: 0.116065, avg_loss: 1.011494 004280/005400, loss: 0.093895, avg_loss: 1.010467 004285/005400, loss: 0.091400, avg_loss: 1.009428 004290/005400, loss: 0.135847, avg_loss: 1.008452 004295/005400, loss: 0.131350, avg_loss: 1.007404 004300/005400, loss: 0.086305, avg_loss: 1.006382 004305/005400, loss: 0.149123, avg_loss: 1.005382 004310/005400, loss: 0.077175, avg_loss: 1.004378 004315/005400, loss: 0.130131, avg_loss: 1.003319 004320/005400, loss: 0.081299, avg_loss: 1.002324 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 23, step 4320/5400: {'pearson': 0.8516975249826064, 'spearmanr': 0.848650349577711} 004325/005400, loss: 0.162849, avg_loss: 1.001306 004330/005400, loss: 0.106408, avg_loss: 1.000275 004335/005400, loss: 0.112816, avg_loss: 0.999257 004340/005400, loss: 0.117222, avg_loss: 0.998237 004345/005400, loss: 0.163939, avg_loss: 0.997231 004350/005400, loss: 0.132185, avg_loss: 0.996227 004355/005400, loss: 0.120796, avg_loss: 0.995204 004360/005400, loss: 0.110429, avg_loss: 0.994174 004365/005400, loss: 0.176232, avg_loss: 0.993202 004370/005400, loss: 0.108074, avg_loss: 0.992212 004375/005400, loss: 0.172169, avg_loss: 0.991219 004380/005400, loss: 0.115517, avg_loss: 0.990262 004385/005400, loss: 0.121775, avg_loss: 0.989272 004390/005400, loss: 0.126275, avg_loss: 0.988271 004395/005400, loss: 0.107515, avg_loss: 0.987280 004400/005400, loss: 0.086793, avg_loss: 0.986292 004405/005400, loss: 0.124593, avg_loss: 0.985330 004410/005400, loss: 0.132320, avg_loss: 0.984378 004415/005400, loss: 0.167460, avg_loss: 0.983421 004420/005400, loss: 0.143033, avg_loss: 0.982441 004425/005400, loss: 0.123850, avg_loss: 0.981486 004430/005400, loss: 0.095938, avg_loss: 0.980500 004435/005400, loss: 0.135725, avg_loss: 0.979532 004440/005400, loss: 0.081924, avg_loss: 0.978545 004445/005400, loss: 0.122266, avg_loss: 0.977577 004450/005400, loss: 0.101690, avg_loss: 0.976582 004455/005400, loss: 0.159788, avg_loss: 0.975642 004460/005400, loss: 0.173950, avg_loss: 0.974689 004465/005400, loss: 0.140697, avg_loss: 0.973738 004470/005400, loss: 0.118950, avg_loss: 0.972768 004475/005400, loss: 0.090251, avg_loss: 0.971827 004480/005400, loss: 0.142421, avg_loss: 0.970929 004485/005400, loss: 0.093378, avg_loss: 0.969989 004490/005400, loss: 0.132023, avg_loss: 0.969045 004495/005400, loss: 0.177814, avg_loss: 0.968117 004500/005400, loss: 0.098508, avg_loss: 0.967177 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 24, step 4500/5400: {'pearson': 0.8521285961729241, 'spearmanr': 0.8483649029133034} 004505/005400, loss: 0.110328, avg_loss: 0.966241 004510/005400, loss: 0.102069, avg_loss: 0.965301 004515/005400, loss: 0.162850, avg_loss: 0.964387 004520/005400, loss: 0.151388, avg_loss: 0.963471 004525/005400, loss: 0.087982, avg_loss: 0.962517 004530/005400, loss: 0.107709, avg_loss: 0.961596 004535/005400, loss: 0.093900, avg_loss: 0.960650 004540/005400, loss: 0.087082, avg_loss: 0.959718 004545/005400, loss: 0.125253, avg_loss: 0.958788 004550/005400, loss: 0.091174, avg_loss: 0.957888 004555/005400, loss: 0.064617, avg_loss: 0.956960 004560/005400, loss: 0.079881, avg_loss: 0.956014 004565/005400, loss: 0.103206, avg_loss: 0.955083 004570/005400, loss: 0.143321, avg_loss: 0.954173 004575/005400, loss: 0.149977, avg_loss: 0.953271 004580/005400, loss: 0.092622, avg_loss: 0.952352 004585/005400, loss: 0.126467, avg_loss: 0.951438 004590/005400, loss: 0.094483, avg_loss: 0.950528 004595/005400, loss: 0.107402, avg_loss: 0.949638 004600/005400, loss: 0.082824, avg_loss: 0.948713 004605/005400, loss: 0.195657, avg_loss: 0.947822 004610/005400, loss: 0.124441, avg_loss: 0.946935 004615/005400, loss: 0.121700, avg_loss: 0.946019 004620/005400, loss: 0.148236, avg_loss: 0.945115 004625/005400, loss: 0.140154, avg_loss: 0.944207 004630/005400, loss: 0.165997, avg_loss: 0.943339 004635/005400, loss: 0.098995, avg_loss: 0.942421 004640/005400, loss: 0.120260, avg_loss: 0.941555 004645/005400, loss: 0.125061, avg_loss: 0.940660 004650/005400, loss: 0.112413, avg_loss: 0.939759 004655/005400, loss: 0.104798, avg_loss: 0.938884 004660/005400, loss: 0.105972, avg_loss: 0.937982 004665/005400, loss: 0.137016, avg_loss: 0.937103 004670/005400, loss: 0.086489, avg_loss: 0.936211 004675/005400, loss: 0.130223, avg_loss: 0.935320 004680/005400, loss: 0.067240, avg_loss: 0.934422 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 25, step 4680/5400: {'pearson': 0.8527350758782244, 'spearmanr': 0.8507346588341773} 004685/005400, loss: 0.153279, avg_loss: 0.933544 004690/005400, loss: 0.084691, avg_loss: 0.932667 004695/005400, loss: 0.127423, avg_loss: 0.931791 004700/005400, loss: 0.123418, avg_loss: 0.930916 004705/005400, loss: 0.102127, avg_loss: 0.930041 004710/005400, loss: 0.094543, avg_loss: 0.929162 004715/005400, loss: 0.112821, avg_loss: 0.928298 004720/005400, loss: 0.094509, avg_loss: 0.927409 004725/005400, loss: 0.148458, avg_loss: 0.926568 004730/005400, loss: 0.084626, avg_loss: 0.925698 004735/005400, loss: 0.074723, avg_loss: 0.924825 004740/005400, loss: 0.101060, avg_loss: 0.923955 004745/005400, loss: 0.202314, avg_loss: 0.923109 004750/005400, loss: 0.055448, avg_loss: 0.922249 004755/005400, loss: 0.179162, avg_loss: 0.921398 004760/005400, loss: 0.109892, avg_loss: 0.920548 004765/005400, loss: 0.128364, avg_loss: 0.919729 004770/005400, loss: 0.064578, avg_loss: 0.918864 004775/005400, loss: 0.119190, avg_loss: 0.918013 004780/005400, loss: 0.106971, avg_loss: 0.917168 004785/005400, loss: 0.103674, avg_loss: 0.916293 004790/005400, loss: 0.086906, avg_loss: 0.915437 004795/005400, loss: 0.078381, avg_loss: 0.914585 004800/005400, loss: 0.075235, avg_loss: 0.913733 004805/005400, loss: 0.089770, avg_loss: 0.912904 004810/005400, loss: 0.104229, avg_loss: 0.912068 004815/005400, loss: 0.100126, avg_loss: 0.911218 004820/005400, loss: 0.118553, avg_loss: 0.910386 004825/005400, loss: 0.143384, avg_loss: 0.909569 004830/005400, loss: 0.102282, avg_loss: 0.908771 004835/005400, loss: 0.148549, avg_loss: 0.907961 004840/005400, loss: 0.182026, avg_loss: 0.907128 004845/005400, loss: 0.190450, avg_loss: 0.906318 004850/005400, loss: 0.206217, avg_loss: 0.905508 004855/005400, loss: 0.064875, avg_loss: 0.904663 004860/005400, loss: 0.099118, avg_loss: 0.903843 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 26, step 4860/5400: {'pearson': 0.8501907223027365, 'spearmanr': 0.8489084429386828} 004865/005400, loss: 0.155720, avg_loss: 0.903033 004870/005400, loss: 0.123857, avg_loss: 0.902210 004875/005400, loss: 0.106955, avg_loss: 0.901405 004880/005400, loss: 0.141843, avg_loss: 0.900608 004885/005400, loss: 0.101737, avg_loss: 0.899809 004890/005400, loss: 0.159319, avg_loss: 0.899006 004895/005400, loss: 0.095495, avg_loss: 0.898173 004900/005400, loss: 0.134695, avg_loss: 0.897373 004905/005400, loss: 0.062803, avg_loss: 0.896564 004910/005400, loss: 0.132602, avg_loss: 0.895749 004915/005400, loss: 0.117661, avg_loss: 0.894927 004920/005400, loss: 0.134668, avg_loss: 0.894128 004925/005400, loss: 0.089291, avg_loss: 0.893325 004930/005400, loss: 0.116079, avg_loss: 0.892549 004935/005400, loss: 0.092115, avg_loss: 0.891750 004940/005400, loss: 0.132650, avg_loss: 0.890975 004945/005400, loss: 0.062088, avg_loss: 0.890193 004950/005400, loss: 0.062359, avg_loss: 0.889396 004955/005400, loss: 0.086961, avg_loss: 0.888640 004960/005400, loss: 0.155230, avg_loss: 0.887873 004965/005400, loss: 0.110812, avg_loss: 0.887072 004970/005400, loss: 0.068260, avg_loss: 0.886263 004975/005400, loss: 0.156115, avg_loss: 0.885500 004980/005400, loss: 0.124095, avg_loss: 0.884712 004985/005400, loss: 0.126226, avg_loss: 0.883916 004990/005400, loss: 0.083915, avg_loss: 0.883123 004995/005400, loss: 0.083612, avg_loss: 0.882345 005000/005400, loss: 0.129824, avg_loss: 0.881565 005005/005400, loss: 0.131232, avg_loss: 0.880788 005010/005400, loss: 0.122785, avg_loss: 0.879998 005015/005400, loss: 0.103774, avg_loss: 0.879201 005020/005400, loss: 0.090597, avg_loss: 0.878425 005025/005400, loss: 0.084932, avg_loss: 0.877641 005030/005400, loss: 0.107362, avg_loss: 0.876879 005035/005400, loss: 0.127814, avg_loss: 0.876147 005040/005400, loss: 0.170438, avg_loss: 0.875395 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 27, step 5040/5400: {'pearson': 0.8554061134436448, 'spearmanr': 0.8524378109427393} 005045/005400, loss: 0.076105, avg_loss: 0.874619 005050/005400, loss: 0.103730, avg_loss: 0.873847 005055/005400, loss: 0.064459, avg_loss: 0.873081 005060/005400, loss: 0.112303, avg_loss: 0.872322 005065/005400, loss: 0.071940, avg_loss: 0.871558 005070/005400, loss: 0.088598, avg_loss: 0.870804 005075/005400, loss: 0.090358, avg_loss: 0.870038 005080/005400, loss: 0.068572, avg_loss: 0.869276 005085/005400, loss: 0.068552, avg_loss: 0.868498 005090/005400, loss: 0.140135, avg_loss: 0.867759 005095/005400, loss: 0.076569, avg_loss: 0.866984 005100/005400, loss: 0.098298, avg_loss: 0.866226 005105/005400, loss: 0.108042, avg_loss: 0.865460 005110/005400, loss: 0.072785, avg_loss: 0.864705 005115/005400, loss: 0.155046, avg_loss: 0.863964 005120/005400, loss: 0.225429, avg_loss: 0.863240 005125/005400, loss: 0.089123, avg_loss: 0.862489 005130/005400, loss: 0.062908, avg_loss: 0.861729 005135/005400, loss: 0.050400, avg_loss: 0.860973 005140/005400, loss: 0.051159, avg_loss: 0.860225 005145/005400, loss: 0.092017, avg_loss: 0.859488 005150/005400, loss: 0.119527, avg_loss: 0.858754 005155/005400, loss: 0.089991, avg_loss: 0.858025 005160/005400, loss: 0.093003, avg_loss: 0.857288 005165/005400, loss: 0.096349, avg_loss: 0.856544 005170/005400, loss: 0.070699, avg_loss: 0.855824 005175/005400, loss: 0.061141, avg_loss: 0.855089 005180/005400, loss: 0.117543, avg_loss: 0.854368 005185/005400, loss: 0.070801, avg_loss: 0.853643 005190/005400, loss: 0.110298, avg_loss: 0.852924 005195/005400, loss: 0.114286, avg_loss: 0.852209 005200/005400, loss: 0.129566, avg_loss: 0.851483 005205/005400, loss: 0.101796, avg_loss: 0.850774 005210/005400, loss: 0.125870, avg_loss: 0.850059 005215/005400, loss: 0.049415, avg_loss: 0.849329 005220/005400, loss: 0.107189, avg_loss: 0.848612 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 28, step 5220/5400: {'pearson': 0.8540466796613693, 'spearmanr': 0.850937622804088} 005225/005400, loss: 0.080942, avg_loss: 0.847872 005230/005400, loss: 0.102161, avg_loss: 0.847163 005235/005400, loss: 0.082529, avg_loss: 0.846440 005240/005400, loss: 0.105809, avg_loss: 0.845731 005245/005400, loss: 0.117093, avg_loss: 0.845006 005250/005400, loss: 0.106933, avg_loss: 0.844305 005255/005400, loss: 0.074675, avg_loss: 0.843584 005260/005400, loss: 0.102407, avg_loss: 0.842881 005265/005400, loss: 0.148522, avg_loss: 0.842175 005270/005400, loss: 0.087407, avg_loss: 0.841448 005275/005400, loss: 0.098112, avg_loss: 0.840739 005280/005400, loss: 0.092396, avg_loss: 0.840018 005285/005400, loss: 0.062919, avg_loss: 0.839300 005290/005400, loss: 0.132550, avg_loss: 0.838632 005295/005400, loss: 0.145091, avg_loss: 0.837944 005300/005400, loss: 0.118631, avg_loss: 0.837256 005305/005400, loss: 0.056487, avg_loss: 0.836545 005310/005400, loss: 0.103461, avg_loss: 0.835856 005315/005400, loss: 0.112280, avg_loss: 0.835183 005320/005400, loss: 0.037065, avg_loss: 0.834495 005325/005400, loss: 0.102541, avg_loss: 0.833812 005330/005400, loss: 0.052560, avg_loss: 0.833121 005335/005400, loss: 0.118150, avg_loss: 0.832437 005340/005400, loss: 0.093599, avg_loss: 0.831748 005345/005400, loss: 0.057692, avg_loss: 0.831051 005350/005400, loss: 0.083881, avg_loss: 0.830350 005355/005400, loss: 0.092801, avg_loss: 0.829662 005360/005400, loss: 0.109509, avg_loss: 0.828983 005365/005400, loss: 0.126566, avg_loss: 0.828295 005370/005400, loss: 0.090441, avg_loss: 0.827624 005375/005400, loss: 0.098362, avg_loss: 0.826956 005380/005400, loss: 0.086417, avg_loss: 0.826275 005385/005400, loss: 0.090084, avg_loss: 0.825580 005390/005400, loss: 0.089639, avg_loss: 0.824919 005395/005400, loss: 0.112607, avg_loss: 0.824232 005400/005400, loss: 0.079185, avg_loss: 0.823571 ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 epoch 29, step 5400/5400: {'pearson': 0.8466150052031443, 'spearmanr': 0.845214209063919} ***** Running train evaluation ***** Num examples = 5749 Instantaneous batch size per device = 32 Train Dataset Result: {'pearson': 0.9873720770623174, 'spearmanr': 0.987580466183614} ***** Running dev evaluation ***** Num examples = 1500 Instantaneous batch size per device = 32 Dev Dataset Result: {'pearson': 0.8466150052031443, 'spearmanr': 0.845214209063919} Training time 0:04:24