File size: 11,266 Bytes
cf9a64f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
=================================================================
GEOLIP VISION ENCODER β€” FROM SCRATCH
  ViT: 6L/384d/6h, patch16
  196 patches + CLS β†’ 128-d output
  Device: cuda
=================================================================

  Loading soup...
  Soup: mAP=0.837 CV_target=0.2731
  train: loaded cached targets (118,287)
  val: loaded cached targets (5,000)
  Caching train images (118,287)...
Resolving data files: 100%
 39/39 [00:00<00:00, 5057.75it/s]
Downloading data: 100%
 39/39 [04:55<00:00,  7.45s/files]
default/train/0002.parquet: 100%
 509M/509M [00:09<00:00, 69.4MB/s]
default/train/0003.parquet: 100%
 502M/502M [00:03<00:00, 298MB/s]
default/train/0004.parquet: 100%
 507M/507M [00:10<00:00, 88.0MB/s]
default/train/0005.parquet: 100%
 499M/499M [00:04<00:00, 95.4MB/s]
default/train/0006.parquet: 100%
 510M/510M [00:09<00:00, 73.4MB/s]
default/train/0007.parquet: 100%
 502M/502M [00:06<00:00, 47.9MB/s]
default/train/0008.parquet: 100%
 514M/514M [00:09<00:00, 90.8MB/s]
default/train/0009.parquet: 100%
 509M/509M [00:06<00:00, 111MB/s]
default/train/0010.parquet: 100%
 509M/509M [00:07<00:00, 89.7MB/s]
default/train/0011.parquet: 100%
 505M/505M [00:05<00:00, 70.6MB/s]
default/train/0012.parquet: 100%
 507M/507M [00:06<00:00, 87.5MB/s]
default/train/0013.parquet: 100%
 502M/502M [00:09<00:00, 59.5MB/s]
default/train/0014.parquet: 100%
 504M/504M [00:09<00:00, 70.8MB/s]
default/train/0015.parquet: 100%
 514M/514M [00:07<00:00, 122MB/s]
default/train/0016.parquet: 100%
 507M/507M [00:07<00:00, 95.1MB/s]
default/train/0017.parquet: 100%
 509M/509M [00:09<00:00, 89.6MB/s]
default/train/0018.parquet: 100%
 504M/504M [00:06<00:00, 63.2MB/s]
default/train/0019.parquet: 100%
 511M/511M [00:10<00:00, 83.7MB/s]
default/train/0020.parquet: 100%
 510M/510M [00:10<00:00, 72.5MB/s]
default/train/0021.parquet: 100%
 504M/504M [00:09<00:00, 77.3MB/s]
default/train/0022.parquet: 100%
 507M/507M [00:10<00:00, 89.6MB/s]
default/train/0023.parquet: 100%
 511M/511M [00:10<00:00, 65.3MB/s]
default/train/0024.parquet: 100%
 505M/505M [00:09<00:00, 78.0MB/s]
default/train/0025.parquet: 100%
 503M/503M [00:04<00:00, 196MB/s]
default/train/0026.parquet: 100%
 508M/508M [00:05<00:00, 121MB/s]
default/train/0027.parquet: 100%
 508M/508M [00:06<00:00, 93.1MB/s]
default/train/0028.parquet: 100%
 507M/507M [00:05<00:00, 122MB/s]
default/train/0029.parquet: 100%
 510M/510M [00:07<00:00, 75.8MB/s]
default/train/0030.parquet: 100%
 505M/505M [00:08<00:00, 71.4MB/s]
default/train/0031.parquet: 100%
 502M/502M [00:04<00:00, 168MB/s]
default/train/0032.parquet: 100%
 502M/502M [00:02<00:00, 321MB/s]
default/train/0033.parquet: 100%
 508M/508M [00:07<00:00, 86.3MB/s]
default/train/0034.parquet: 100%
 504M/504M [00:07<00:00, 78.1MB/s]
default/train/0035.parquet: 100%
 499M/499M [00:16<00:00, 101MB/s]
default/train/0036.parquet: 100%
 507M/507M [00:10<00:00, 78.6MB/s]
default/train/0037.parquet: 100%
 501M/501M [00:09<00:00, 106MB/s]
default/train/0038.parquet: 100%
 79.2M/79.2M [00:01<00:00, 173MB/s]
default/val/0000.parquet: 100%
 504M/504M [00:04<00:00, 128MB/s]
default/val/0001.parquet: 100%
 311M/311M [00:03<00:00, 165MB/s]
Generating train split: 
 118287/0 [01:49<00:00, 1378.35 examples/s]
Generating validation split: 
 5000/0 [00:05<00:00, 617.41 examples/s]
Loading dataset shards: 100%
 39/39 [00:05<00:00,  8.83it/s]
  Caching train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 118287/118287 [13:03<00:00, 151.05it/s]
  Cached 118287/118287 images
  Saved: cached_train_images.pt (35611 MB)
  Caching val images (5,000)...
Resolving data files: 100%
 39/39 [00:00<00:00, 4857.40it/s]
  Caching val: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5000/5000 [00:33<00:00, 148.88it/s]
  Cached 5000/5000 images
  Saved: cached_val_images.pt (1505 MB)

=================================================================
BUILD ENCODER
=================================================================
  Architecture: 6L/384d/6h, patch16
  Input: 224Γ—224 β†’ 196 patches
  Output: 128-d (on hypersphere)
  Parameters: 11,216,768

=================================================================
TRAINING
  20 epochs, lr=0.0003, batch=48
  Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment
  CV target: 0.2731
  Images: train=118,287 val=5,000 (cached as tensors)
=================================================================
E 1/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1]
  E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340
  E1 val:   mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β˜…
E 2/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1]
  E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553
  E2 val:   mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β˜…
E 3/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1]
  E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641
  E3 val:   mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β˜…
E 4/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1]
  E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695
  E4 val:   mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β˜…
E 5/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1]
  E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743
  E5 val:   mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β˜…
E 6/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1]
  E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784
  E6 val:   mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β˜…
E 7/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1]
  E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815
  E7 val:   mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β˜…
E 8/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1]
  E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843
  E8 val:   mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β˜…
E 9/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1]
  E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866
  E9 val:   mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β˜…
E10/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1]
  E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887
  E10 val:   mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 β˜…
E11/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1]
  E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905
  E11 val:   mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 β˜…
E12/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1]
  E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920
  E12 val:   mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 β˜…
E13/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1]
  E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933
  E13 val:   mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 β˜…
E14/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1]
  E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945
  E14 val:   mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 β˜…
E15/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1]
  E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954
  E15 val:   mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 β˜…
E16/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1]
  E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961
  E16 val:   mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 β˜…
E17/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1]
  E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966
  E17 val:   mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 β˜…
E18/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1]
  E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969
  E18 val:   mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 β˜…
E19/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1]
  E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971
  E19 val:   mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 β˜…
E20/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1]
  E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972
  E20 val:   mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000

  Best mAP: 0.429
  Encoder: 11,216,768 params (from scratch)
  Checkpoints saved every epoch in checkpoints/
  Tensorboard: runs/geolip_vit_encoder

=================================================================
DONE
=================================================================