================================================================= GEOLIP VISION ENCODER — FROM SCRATCH ViT: 6L/384d/6h, patch16 196 patches + CLS → 128-d output Device: cuda ================================================================= Loading soup... Soup: mAP=0.837 CV_target=0.2731 train: loaded cached targets (118,287) val: loaded cached targets (5,000) Caching train images (118,287)... Resolving data files: 100%  39/39 [00:00<00:00, 5057.75it/s] Downloading data: 100%  39/39 [04:55<00:00,  7.45s/files] default/train/0002.parquet: 100%  509M/509M [00:09<00:00, 69.4MB/s] default/train/0003.parquet: 100%  502M/502M [00:03<00:00, 298MB/s] default/train/0004.parquet: 100%  507M/507M [00:10<00:00, 88.0MB/s] default/train/0005.parquet: 100%  499M/499M [00:04<00:00, 95.4MB/s] default/train/0006.parquet: 100%  510M/510M [00:09<00:00, 73.4MB/s] default/train/0007.parquet: 100%  502M/502M [00:06<00:00, 47.9MB/s] default/train/0008.parquet: 100%  514M/514M [00:09<00:00, 90.8MB/s] default/train/0009.parquet: 100%  509M/509M [00:06<00:00, 111MB/s] default/train/0010.parquet: 100%  509M/509M [00:07<00:00, 89.7MB/s] default/train/0011.parquet: 100%  505M/505M [00:05<00:00, 70.6MB/s] default/train/0012.parquet: 100%  507M/507M [00:06<00:00, 87.5MB/s] default/train/0013.parquet: 100%  502M/502M [00:09<00:00, 59.5MB/s] default/train/0014.parquet: 100%  504M/504M [00:09<00:00, 70.8MB/s] default/train/0015.parquet: 100%  514M/514M [00:07<00:00, 122MB/s] default/train/0016.parquet: 100%  507M/507M [00:07<00:00, 95.1MB/s] default/train/0017.parquet: 100%  509M/509M [00:09<00:00, 89.6MB/s] default/train/0018.parquet: 100%  504M/504M [00:06<00:00, 63.2MB/s] default/train/0019.parquet: 100%  511M/511M [00:10<00:00, 83.7MB/s] default/train/0020.parquet: 100%  510M/510M [00:10<00:00, 72.5MB/s] default/train/0021.parquet: 100%  504M/504M [00:09<00:00, 77.3MB/s] default/train/0022.parquet: 100%  507M/507M [00:10<00:00, 89.6MB/s] default/train/0023.parquet: 100%  511M/511M [00:10<00:00, 65.3MB/s] default/train/0024.parquet: 100%  505M/505M [00:09<00:00, 78.0MB/s] default/train/0025.parquet: 100%  503M/503M [00:04<00:00, 196MB/s] default/train/0026.parquet: 100%  508M/508M [00:05<00:00, 121MB/s] default/train/0027.parquet: 100%  508M/508M [00:06<00:00, 93.1MB/s] default/train/0028.parquet: 100%  507M/507M [00:05<00:00, 122MB/s] default/train/0029.parquet: 100%  510M/510M [00:07<00:00, 75.8MB/s] default/train/0030.parquet: 100%  505M/505M [00:08<00:00, 71.4MB/s] default/train/0031.parquet: 100%  502M/502M [00:04<00:00, 168MB/s] default/train/0032.parquet: 100%  502M/502M [00:02<00:00, 321MB/s] default/train/0033.parquet: 100%  508M/508M [00:07<00:00, 86.3MB/s] default/train/0034.parquet: 100%  504M/504M [00:07<00:00, 78.1MB/s] default/train/0035.parquet: 100%  499M/499M [00:16<00:00, 101MB/s] default/train/0036.parquet: 100%  507M/507M [00:10<00:00, 78.6MB/s] default/train/0037.parquet: 100%  501M/501M [00:09<00:00, 106MB/s] default/train/0038.parquet: 100%  79.2M/79.2M [00:01<00:00, 173MB/s] default/val/0000.parquet: 100%  504M/504M [00:04<00:00, 128MB/s] default/val/0001.parquet: 100%  311M/311M [00:03<00:00, 165MB/s] Generating train split:   118287/0 [01:49<00:00, 1378.35 examples/s] Generating validation split:   5000/0 [00:05<00:00, 617.41 examples/s] Loading dataset shards: 100%  39/39 [00:05<00:00,  8.83it/s] Caching train: 100%|██████████| 118287/118287 [13:03<00:00, 151.05it/s] Cached 118287/118287 images Saved: cached_train_images.pt (35611 MB) Caching val images (5,000)... Resolving data files: 100%  39/39 [00:00<00:00, 4857.40it/s] Caching val: 100%|██████████| 5000/5000 [00:33<00:00, 148.88it/s] Cached 5000/5000 images Saved: cached_val_images.pt (1505 MB) ================================================================= BUILD ENCODER ================================================================= Architecture: 6L/384d/6h, patch16 Input: 224×224 → 196 patches Output: 128-d (on hypersphere) Parameters: 11,216,768 ================================================================= TRAINING 20 epochs, lr=0.0003, batch=48 Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment CV target: 0.2731 Images: train=118,287 val=5,000 (cached as tensors) ================================================================= E 1/20 train: 100%|██████████| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1] E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340 E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 ★ E 2/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1] E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553 E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 ★ E 3/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1] E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641 E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 ★ E 4/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1] E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695 E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 ★ E 5/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1] E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743 E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 ★ E 6/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1] E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784 E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 ★ E 7/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1] E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815 E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 ★ E 8/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1] E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843 E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 ★ E 9/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1] E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866 E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 ★ E10/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1] E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887 E10 val: mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 ★ E11/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1] E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905 E11 val: mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 ★ E12/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1] E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920 E12 val: mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 ★ E13/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1] E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933 E13 val: mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 ★ E14/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1] E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945 E14 val: mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 ★ E15/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1] E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954 E15 val: mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 ★ E16/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1] E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961 E16 val: mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 ★ E17/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1] E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966 E17 val: mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 ★ E18/20 train: 100%|██████████| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1] E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969 E18 val: mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 ★ E19/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1] E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971 E19 val: mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 ★ E20/20 train: 100%|██████████| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1] E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972 E20 val: mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000 Best mAP: 0.429 Encoder: 11,216,768 params (from scratch) Checkpoints saved every epoch in checkpoints/ Tensorboard: runs/geolip_vit_encoder ================================================================= DONE =================================================================