Update README.md
Browse files
README.md
CHANGED
|
@@ -25,6 +25,10 @@ in the smaller model by direct implicit learning.
|
|
| 25 |
|
| 26 |
geolip-vit-x3 must learn to predict the experts, which means it can never get a full picture of anything outside of it's own tools.
|
| 27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
The anchors are strong enough and tuned to the experts, the external losses are tuned to teach the expert responses, the expert data is used
|
| 29 |
as loss methods of attenuation, and the structure conforms to those losses specifically because it's required to teach the model tobe
|
| 30 |
standalone and compliant without requiring the experts later.
|
|
|
|
| 25 |
|
| 26 |
geolip-vit-x3 must learn to predict the experts, which means it can never get a full picture of anything outside of it's own tools.
|
| 27 |
|
| 28 |
+
This model is exceptionally small, absurdly small even by vit standards. This is because even at this size, this is too much. The model cannot
|
| 29 |
+
overfit if the model uses every tool at the expense, this model will train indefinitely unless a cascade overflow happens, a math continuity corruption
|
| 30 |
+
occurs, or the substructure collapses to a simpler shortcut-centric behavior that would require scrambling.
|
| 31 |
+
|
| 32 |
The anchors are strong enough and tuned to the experts, the external losses are tuned to teach the expert responses, the expert data is used
|
| 33 |
as loss methods of attenuation, and the structure conforms to those losses specifically because it's required to teach the model tobe
|
| 34 |
standalone and compliant without requiring the experts later.
|