AbstractPhil commited on
Commit
68eead2
·
verified ·
1 Parent(s): d9e7232

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -0
README.md CHANGED
@@ -25,6 +25,10 @@ in the smaller model by direct implicit learning.
25
 
26
  geolip-vit-x3 must learn to predict the experts, which means it can never get a full picture of anything outside of it's own tools.
27
 
 
 
 
 
28
  The anchors are strong enough and tuned to the experts, the external losses are tuned to teach the expert responses, the expert data is used
29
  as loss methods of attenuation, and the structure conforms to those losses specifically because it's required to teach the model tobe
30
  standalone and compliant without requiring the experts later.
 
25
 
26
  geolip-vit-x3 must learn to predict the experts, which means it can never get a full picture of anything outside of it's own tools.
27
 
28
+ This model is exceptionally small, absurdly small even by vit standards. This is because even at this size, this is too much. The model cannot
29
+ overfit if the model uses every tool at the expense, this model will train indefinitely unless a cascade overflow happens, a math continuity corruption
30
+ occurs, or the substructure collapses to a simpler shortcut-centric behavior that would require scrambling.
31
+
32
  The anchors are strong enough and tuned to the experts, the external losses are tuned to teach the expert responses, the expert data is used
33
  as loss methods of attenuation, and the structure conforms to those losses specifically because it's required to teach the model tobe
34
  standalone and compliant without requiring the experts later.