AbstractPhil
/

geolip-vit-base-x3

@@ -2,6 +2,68 @@
 license: apache-2.0
 ---
 # Experiment 2.5:
 The xavier aligned and procrustes embedding array attached to a standard patch16 subset should suffice.

 license: apache-2.0
 ---
+# Experiment 2.5 Update: COCO convergence is slow but steady.
+```
+=================================================================
+GEOLIP VISION ENCODER — FROM SCRATCH
+  ViT: 6L/384d/6h, patch16
+  196 patches + CLS → 128-d output
+  Device: cuda
+=================================================================
+  Loading soup...
+  Soup: mAP=0.837 CV_target=0.2731
+  train: loaded cached targets (118,287)
+  val: loaded cached targets (5,000)
+  Caching train images (118,287)...
+=================================================================
+BUILD ENCODER
+=================================================================
+  Architecture: 6L/384d/6h, patch16
+  Input: 224×224 → 196 patches
+  Output: 128-d (on hypersphere)
+  Parameters: 11,216,768
+=================================================================
+TRAINING
+  20 epochs, lr=0.0003, batch=48
+  Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment
+  CV target: 0.2731
+  Images: train=118,287 val=5,000 (cached as tensors)
+=================================================================
+E 1/20 train: 100%|██████████| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1]
+  E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340
+  E1 val:   mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 ★
+E 2/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1]
+  E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553
+  E2 val:   mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 ★
+E 3/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1]
+  E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641
+  E3 val:   mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 ★
+E 4/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1]
+  E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695
+  E4 val:   mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 ★
+E 5/20 train: 100%|██████████| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1]
+  E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743
+  E5 val:   mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 ★
+E 6/20 train: 100%|██████████| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1]
+  E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784
+  E6 val:   mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 ★
+E 7/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1]
+  E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815
+  E7 val:   mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 ★
+E 8/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1]
+  E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843
+  E8 val:   mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 ★
+E 9/20 train: 100%|██████████| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1]
+  E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866
+  E9 val:   mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 ★
+E10/20 train:  36%|███▌      | 892/2465 [00:57<01:40, 15.69batch/s, cos=0.572, loss=0.6548, nce_acc=0.887, ordered=1]
+```
 # Experiment 2.5:
 The xavier aligned and procrustes embedding array attached to a standard patch16 subset should suffice.