Update README.md
Browse files
README.md
CHANGED
|
@@ -2,6 +2,68 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
# Experiment 2.5:
|
| 6 |
The xavier aligned and procrustes embedding array attached to a standard patch16 subset should suffice.
|
| 7 |
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
|
| 5 |
+
# Experiment 2.5 Update: COCO convergence is slow but steady.
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
=================================================================
|
| 9 |
+
GEOLIP VISION ENCODER β FROM SCRATCH
|
| 10 |
+
ViT: 6L/384d/6h, patch16
|
| 11 |
+
196 patches + CLS β 128-d output
|
| 12 |
+
Device: cuda
|
| 13 |
+
=================================================================
|
| 14 |
+
|
| 15 |
+
Loading soup...
|
| 16 |
+
Soup: mAP=0.837 CV_target=0.2731
|
| 17 |
+
train: loaded cached targets (118,287)
|
| 18 |
+
val: loaded cached targets (5,000)
|
| 19 |
+
Caching train images (118,287)...
|
| 20 |
+
|
| 21 |
+
=================================================================
|
| 22 |
+
BUILD ENCODER
|
| 23 |
+
=================================================================
|
| 24 |
+
Architecture: 6L/384d/6h, patch16
|
| 25 |
+
Input: 224Γ224 β 196 patches
|
| 26 |
+
Output: 128-d (on hypersphere)
|
| 27 |
+
Parameters: 11,216,768
|
| 28 |
+
|
| 29 |
+
=================================================================
|
| 30 |
+
TRAINING
|
| 31 |
+
20 epochs, lr=0.0003, batch=48
|
| 32 |
+
Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment
|
| 33 |
+
CV target: 0.2731
|
| 34 |
+
Images: train=118,287 val=5,000 (cached as tensors)
|
| 35 |
+
=================================================================
|
| 36 |
+
E 1/20 train: 100%|ββββββββββ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1]
|
| 37 |
+
E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340
|
| 38 |
+
E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β
|
| 39 |
+
E 2/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1]
|
| 40 |
+
E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553
|
| 41 |
+
E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β
|
| 42 |
+
E 3/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1]
|
| 43 |
+
E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641
|
| 44 |
+
E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β
|
| 45 |
+
E 4/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1]
|
| 46 |
+
E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695
|
| 47 |
+
E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β
|
| 48 |
+
E 5/20 train: 100%|ββββββββββ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1]
|
| 49 |
+
E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743
|
| 50 |
+
E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β
|
| 51 |
+
E 6/20 train: 100%|ββββββββββ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1]
|
| 52 |
+
E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784
|
| 53 |
+
E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β
|
| 54 |
+
E 7/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1]
|
| 55 |
+
E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815
|
| 56 |
+
E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β
|
| 57 |
+
E 8/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1]
|
| 58 |
+
E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843
|
| 59 |
+
E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β
|
| 60 |
+
E 9/20 train: 100%|ββββββββββ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1]
|
| 61 |
+
E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866
|
| 62 |
+
E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β
|
| 63 |
+
E10/20 train: 36%|ββββ | 892/2465 [00:57<01:40, 15.69batch/s, cos=0.572, loss=0.6548, nce_acc=0.887, ordered=1]
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
|
| 67 |
# Experiment 2.5:
|
| 68 |
The xavier aligned and procrustes embedding array attached to a standard patch16 subset should suffice.
|
| 69 |
|