Spaces:
Running
Running
feat(v0.3): inspector + what-if slider + falsification + community widget + registry bootstrap
c11b76c metadata
name: ❌ Refute a prediction
about: TAF prediction contradicted by empirical measurement
title: '[Refute] '
labels: refuted
Hash of analysis being refuted
#__________ ← paste the hash from the original issue's title
Original issue
Link: #__
TAF prediction
What did TAF say:
- Verdict: __
- Key number: __ (e.g. d_horizon = 47781)
My empirical measurement
What actually happened:
- Verdict observed: __
- Key number measured: __ (e.g. NIAH collapse at L=12K, well before predicted ceiling)
- Magnitude of disagreement: __ (% or absolute)
Setup
- Hardware: __
- Software: __ (versions matter!)
- Random seed(s) tried: __
- Number of trials: __
Method
Detailed enough that a third party can reproduce:
# Step-by-step commands
# Or full Python script
Hypothesis on why TAF was wrong
- Out-of-regime (e.g. extrapolation beyond validity zone)
- Architecture-specific quirk not captured in formulas
- Model has unusual training data
- Bug in TAF formulas
- Other: __
Detailed thoughts:
Suggested update to TAF
If applicable, what should the framework do differently?
- Update validity bounds for this recipe
- Add a caveat for this architecture family
- Withdraw the prediction (move to NR-X in paper appendix)
- No change needed (this is a known edge case)