karlexmarin's picture
feat(v0.3): inspector + what-if slider + falsification + community widget + registry bootstrap
c11b76c
|
raw
history blame
1.37 kB
metadata
name:  Refute a prediction
about: TAF prediction contradicted by empirical measurement
title: '[Refute] '
labels: refuted

Hash of analysis being refuted

#__________ ← paste the hash from the original issue's title

Original issue

Link: #__

TAF prediction

What did TAF say:

  • Verdict: __
  • Key number: __ (e.g. d_horizon = 47781)

My empirical measurement

What actually happened:

  • Verdict observed: __
  • Key number measured: __ (e.g. NIAH collapse at L=12K, well before predicted ceiling)
  • Magnitude of disagreement: __ (% or absolute)

Setup

  • Hardware: __
  • Software: __ (versions matter!)
  • Random seed(s) tried: __
  • Number of trials: __

Method

Detailed enough that a third party can reproduce:

# Step-by-step commands
# Or full Python script

Hypothesis on why TAF was wrong

  • Out-of-regime (e.g. extrapolation beyond validity zone)
  • Architecture-specific quirk not captured in formulas
  • Model has unusual training data
  • Bug in TAF formulas
  • Other: __

Detailed thoughts:

Suggested update to TAF

If applicable, what should the framework do differently?

  • Update validity bounds for this recipe
  • Add a caveat for this architecture family
  • Withdraw the prediction (move to NR-X in paper appendix)
  • No change needed (this is a known edge case)