File size: 11,266 Bytes
cf9a64f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 | ================================================================= GEOLIP VISION ENCODER β FROM SCRATCH ViT: 6L/384d/6h, patch16 196 patches + CLS β 128-d output Device: cuda ================================================================= Loading soup... Soup: mAP=0.837 CV_target=0.2731 train: loaded cached targets (118,287) val: loaded cached targets (5,000) Caching train images (118,287)... Resolvingβdataβfiles:β100% β39/39β[00:00<00:00,β5057.75it/s] Downloadingβdata:β100% β39/39β[04:55<00:00,ββ7.45s/files] default/train/0002.parquet:β100% β509M/509Mβ[00:09<00:00,β69.4MB/s] default/train/0003.parquet:β100% β502M/502Mβ[00:03<00:00,β298MB/s] default/train/0004.parquet:β100% β507M/507Mβ[00:10<00:00,β88.0MB/s] default/train/0005.parquet:β100% β499M/499Mβ[00:04<00:00,β95.4MB/s] default/train/0006.parquet:β100% β510M/510Mβ[00:09<00:00,β73.4MB/s] default/train/0007.parquet:β100% β502M/502Mβ[00:06<00:00,β47.9MB/s] default/train/0008.parquet:β100% β514M/514Mβ[00:09<00:00,β90.8MB/s] default/train/0009.parquet:β100% β509M/509Mβ[00:06<00:00,β111MB/s] default/train/0010.parquet:β100% β509M/509Mβ[00:07<00:00,β89.7MB/s] default/train/0011.parquet:β100% β505M/505Mβ[00:05<00:00,β70.6MB/s] default/train/0012.parquet:β100% β507M/507Mβ[00:06<00:00,β87.5MB/s] default/train/0013.parquet:β100% β502M/502Mβ[00:09<00:00,β59.5MB/s] default/train/0014.parquet:β100% β504M/504Mβ[00:09<00:00,β70.8MB/s] default/train/0015.parquet:β100% β514M/514Mβ[00:07<00:00,β122MB/s] default/train/0016.parquet:β100% β507M/507Mβ[00:07<00:00,β95.1MB/s] default/train/0017.parquet:β100% β509M/509Mβ[00:09<00:00,β89.6MB/s] default/train/0018.parquet:β100% β504M/504Mβ[00:06<00:00,β63.2MB/s] default/train/0019.parquet:β100% β511M/511Mβ[00:10<00:00,β83.7MB/s] default/train/0020.parquet:β100% β510M/510Mβ[00:10<00:00,β72.5MB/s] default/train/0021.parquet:β100% β504M/504Mβ[00:09<00:00,β77.3MB/s] default/train/0022.parquet:β100% β507M/507Mβ[00:10<00:00,β89.6MB/s] default/train/0023.parquet:β100% β511M/511Mβ[00:10<00:00,β65.3MB/s] default/train/0024.parquet:β100% β505M/505Mβ[00:09<00:00,β78.0MB/s] default/train/0025.parquet:β100% β503M/503Mβ[00:04<00:00,β196MB/s] default/train/0026.parquet:β100% β508M/508Mβ[00:05<00:00,β121MB/s] default/train/0027.parquet:β100% β508M/508Mβ[00:06<00:00,β93.1MB/s] default/train/0028.parquet:β100% β507M/507Mβ[00:05<00:00,β122MB/s] default/train/0029.parquet:β100% β510M/510Mβ[00:07<00:00,β75.8MB/s] default/train/0030.parquet:β100% β505M/505Mβ[00:08<00:00,β71.4MB/s] default/train/0031.parquet:β100% β502M/502Mβ[00:04<00:00,β168MB/s] default/train/0032.parquet:β100% β502M/502Mβ[00:02<00:00,β321MB/s] default/train/0033.parquet:β100% β508M/508Mβ[00:07<00:00,β86.3MB/s] default/train/0034.parquet:β100% β504M/504Mβ[00:07<00:00,β78.1MB/s] default/train/0035.parquet:β100% β499M/499Mβ[00:16<00:00,β101MB/s] default/train/0036.parquet:β100% β507M/507Mβ[00:10<00:00,β78.6MB/s] default/train/0037.parquet:β100% β501M/501Mβ[00:09<00:00,β106MB/s] default/train/0038.parquet:β100% β79.2M/79.2Mβ[00:01<00:00,β173MB/s] default/val/0000.parquet:β100% β504M/504Mβ[00:04<00:00,β128MB/s] default/val/0001.parquet:β100% β311M/311Mβ[00:03<00:00,β165MB/s] Generatingβtrainβsplit:β β118287/0β[01:49<00:00,β1378.35βexamples/s] Generatingβvalidationβsplit:β β5000/0β[00:05<00:00,β617.41βexamples/s] Loadingβdatasetβshards:β100% β39/39β[00:05<00:00,ββ8.83it/s] Caching train: 100%|ββββββββββ| 118287/118287 [13:03<00:00, 151.05it/s] Cached 118287/118287 images Saved: cached_train_images.pt (35611 MB) Caching val images (5,000)... Resolvingβdataβfiles:β100% β39/39β[00:00<00:00,β4857.40it/s] Caching val: 100%|ββββββββββ| 5000/5000 [00:33<00:00, 148.88it/s] Cached 5000/5000 images Saved: cached_val_images.pt (1505 MB) ================================================================= BUILD ENCODER ================================================================= Architecture: 6L/384d/6h, patch16 Input: 224Γ224 β 196 patches Output: 128-d (on hypersphere) Parameters: 11,216,768 ================================================================= TRAINING 20 epochs, lr=0.0003, batch=48 Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment CV target: 0.2731 Images: train=118,287 val=5,000 (cached as tensors) ================================================================= E 1/20 train: 100%|ββββββββββ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1] E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340 E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β E 2/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1] E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553 E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β E 3/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1] E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641 E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β E 4/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1] E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695 E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β E 5/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1] E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743 E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β E 6/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1] E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784 E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β E 7/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1] E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815 E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β E 8/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1] E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843 E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β E 9/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1] E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866 E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β E10/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1] E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887 E10 val: mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 β E11/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1] E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905 E11 val: mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 β E12/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1] E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920 E12 val: mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 β E13/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1] E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933 E13 val: mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 β E14/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1] E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945 E14 val: mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 β E15/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1] E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954 E15 val: mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 β E16/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1] E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961 E16 val: mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 β E17/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1] E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966 E17 val: mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 β E18/20 train: 100%|ββββββββββ| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1] E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969 E18 val: mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 β E19/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1] E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971 E19 val: mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 β E20/20 train: 100%|ββββββββββ| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1] E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972 E20 val: mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000 Best mAP: 0.429 Encoder: 11,216,768 params (from scratch) Checkpoints saved every epoch in checkpoints/ Tensorboard: runs/geolip_vit_encoder ================================================================= DONE ================================================================= |