--- license: apache-2.0 --- # Experiment 2.5: The xavier aligned and procrustes embedding array attached to a standard patch16 subset should suffice. I'll be training this like CaptionBERT but with a twist, the soup expert is the alignment bank for this one, and I trained it first instead of later. The alignment and R1 is nearly perfect, so it should be cohesive enough through the chain of conceptualization to coalesce through the implications. Now it's another story, if the actual patches will learn based on the embedding and encoding spectrum, and how quickly I can make them learn. The output this encoder produces is a 128 dimensional enriched representational lookup plane on a hypersphere. This is more than enough information to house access to any data route that exists. The dimensional spectrum of a 5d object is so expansive and so enriched, that the entire spectrum of this shape requires a specific curation of the behavior. This is what most of the mechanisms are tasked with overall, pruning the effect of rigidity indifference preservation on the hypersphere represented structure. In other words, that 128 dimensions represents more information than I could express with words. # Experiment 2: 95/256 anchors survive, emergent geometric structure formed. R@1= 97.1%, not quite but getting there. Experiment 2 was successful enough to push harder in this direction. Anchor collapse says it doesn't need all those anchors. It started grabbing at more by the end, which means the system was aligned and then started growing further on a constraint that I was unaware of. This drift curve needs to be controlled. Direct anchored emergence while training is risky. The bank itself survived so well because it was anchored post training, which gave added cohesion and complexity association that I have yet to discover the runtime process to train. I will be analyzing the emergence to preserve the anchoring. ``` ================================================================= PHASE 5: TRAINING 20 epochs, lr=0.001, CV target=0.2731 ================================================================= E 1: mAP=0.788 F1=0.731 R@1=0.971 cos=0.806 cv=0.1213 anchors=226/256 nce=0.999 loss=0.1676 ★ E 2: mAP=0.803 F1=0.742 R@1=0.971 cos=0.809 cv=0.1178 anchors=200/256 nce=0.999 loss=0.1459 ★ E 3: mAP=0.810 F1=0.735 R@1=0.973 cos=0.808 cv=0.1197 anchors=161/256 nce=0.999 loss=0.1431 ★ E 4: mAP=0.817 F1=0.752 R@1=0.971 cos=0.811 cv=0.1262 anchors=131/256 nce=0.999 loss=0.1404 ★ E 5: mAP=0.823 F1=0.755 R@1=0.971 cos=0.812 cv=0.1232 anchors=113/256 nce=0.999 loss=0.1389 ★ E 6: mAP=0.825 F1=0.755 R@1=0.972 cos=0.815 cv=0.1105 anchors=104/256 nce=0.999 loss=0.1379 ★ E 7: mAP=0.827 F1=0.767 R@1=0.970 cos=0.814 cv=0.1125 anchors=101/256 nce=0.999 loss=0.1369 ★ E 8: mAP=0.829 F1=0.763 R@1=0.971 cos=0.815 cv=0.1239 anchors=99/256 nce=0.999 loss=0.1361 ★ E 9: mAP=0.832 F1=0.764 R@1=0.972 cos=0.815 cv=0.1164 anchors=98/256 nce=0.999 loss=0.1355 ★ E10: mAP=0.833 F1=0.765 R@1=0.968 cos=0.814 cv=0.1166 anchors=99/256 nce=0.999 loss=0.1345 ★ E11: mAP=0.834 F1=0.763 R@1=0.971 cos=0.814 cv=0.1214 anchors=98/256 nce=0.999 loss=0.1346 ★ E12: mAP=0.833 F1=0.764 R@1=0.973 cos=0.813 cv=0.1200 anchors=95/256 nce=0.999 loss=0.1343 E13: mAP=0.836 F1=0.761 R@1=0.972 cos=0.813 cv=0.1081 anchors=94/256 nce=0.999 loss=0.1338 ★ E14: mAP=0.836 F1=0.772 R@1=0.973 cos=0.812 cv=0.1170 anchors=95/256 nce=0.999 loss=0.1334 E15: mAP=0.835 F1=0.774 R@1=0.970 cos=0.812 cv=0.1223 anchors=95/256 nce=0.999 loss=0.1338 E16: mAP=0.837 F1=0.777 R@1=0.968 cos=0.812 cv=0.1225 anchors=96/256 nce=1.000 loss=0.1339 ★ E17: mAP=0.834 F1=0.772 R@1=0.973 cos=0.811 cv=0.1089 anchors=95/256 nce=0.999 loss=0.1327 E18: mAP=0.834 F1=0.770 R@1=0.973 cos=0.812 cv=0.1156 anchors=95/256 nce=0.999 loss=0.1321 E19: mAP=0.834 F1=0.773 R@1=0.970 cos=0.811 cv=0.1224 anchors=96/256 nce=0.999 loss=0.1328 E20: mAP=0.835 F1=0.770 R@1=0.971 cos=0.812 cv=0.1159 anchors=96/256 nce=0.999 loss=0.1328 Best mAP: 0.837 CV target: 0.2731 ``` # Experiment 1: Total collapse. The three models did not conform and the patchwork did not learn. The objectives are not correct. One anchor was defaulted to, none of the others utilized. The memory bank solves this problem through queue assessment with the INFONCE hub processing, but this model is a different form of anchoring that did not work. THE ENTIRE MODEL became the anchor, instead of the anchorpoints within the model. I'm thinking there wasn't enough scattering, so I'll try some additional tweaks. ## Post ``` Active anchors: 1/256 (0.4%) Every single image → anchor 65 Anchor entropy: 0.0000 Anchors within cos>0.5 per image: 1.0 Nearest anchor dist: 0.016 — next nearest: 0.665 Effective dim: 23.6/128 Top-20 SVs explain 99.2% Self-sim off-diag: 0.969 Expert uniqueness: 0.0008–0.0011 ``` There is only one active anchor, which is essentially CLS. The uniqueness collapsed. The distance is fine, the entropy is dead. Shortcut bypass, additional nonlinearity must be made. ## Assessment Without the centered procrustes loss the same result happened. The collapse forms around one of the earlier anchors, around the outside middlepoint of where all three models are simultaneously rotating around a point, which is not the direct center. This point has noise, invalidity, incorrect association, and additional problems based on the attention mechanisms internally to the models queried. ## Hypothesis based on research The procrustes alignment must align centerwise, and it must be defined specifically to specifications.