Transformer diagnostic in your browser. Free. Unlimited. Auditable.
All computation happens locally โ your data never leaves this page.
๐ TAF Agent โ User Manual
What does it do?
Predicts practical viability of any transformer LLM before you spend GPU/$.
Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using
deterministic Python formulas (TAF โ Thermodynamic Attention Framework).
How to use โ 2 modes
๐ฌ Ask in plain English (default): type your question, the in-browser LLM picks
the right recipe and runs it. Best for casual exploration.
๐ Pick recipe + form: select a recipe manually, fill the parameters, run.
Best when you want full control or know exactly what you need.
The 5 recipes available
X-1 Custom training vs API โ compares cost of training your own model vs paying for API access.
Try: "Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"
Answer types: YES (custom) / NO (API) with break-even months.
X-2 Long Context Viability โ predicts if a model serves a target context length reliably.
Try: "Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"
Chains: ฮณ_Padรฉ โ decomposition โ d_horizon โ NIAH ceiling โ hallucination โ KV memory.
Verdict: YES / DEGRADED / NO with mitigation if needed.
X-3 Budget pre-flight โ given $ budget, what model is feasible to train?
Try: "I have $5000, what model can I train?"
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).
X-5 Hardware selection โ which GPU should I use to serve at target throughput?
Try: "Cheapest hardware to serve Llama-3-8B at 10M tokens/day"
Answer: best GPU + $/Mtok + capacity vs target.
X-19 KV Compression decision โ should I use soft decay, hard cutoff, or literature methods?
Try: "How to compress KV cache for Qwen2.5-7B at 32K?"
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
Adding new models
Preset list: 11 popular models curated. Just select from dropdown.
HF Hub fetch: paste any model id (e.g. Qwen/Qwen2.5-32B-Instruct),
click ๐ฅ Fetch. Browser downloads config.json directly from HuggingFace,
fills the form. Works for any public model.
Manual: fill the form fields directly with values from the model card.
The audit chain
Every result shows the full Computation Chain โ each formula step with its inputs,
output, and interpretation. Click any step to expand. Cite section numbers (ยง26.1, ยง19.1, etc.) refer
to the underlying paper for derivation.
The plain-English answer
After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load)
synthesizes a plain-English summary. The numbers above are always correct (deterministic Python);
the synthesis is LLM-generated โ verify against the chain if in doubt.
Common parameters explained
ฮธ (rope_theta): RoPE base frequency. Higher = more long-range capacity.
Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).
T_train: max context the model was trained on. From max_position_embeddings.
T_eval: your target inference context length. The key knob.
n_kv_heads < n_attention_heads: model uses GQA (Grouped Query Attention).
Reduces KV memory but pushes ฮณ toward Hagedorn.
has_SWA: model uses Sliding Window Attention (Mistral, gemma-2).
n_params: total parameter count. Threshold ~400M for induction-head emergence.
What to look for in verdicts
YES / GO โ proceed with confidence; numbers support the choice.
DEGRADED / TINY-MODEL โ works but with caveats; read the action.
NO / MEMORY-LIMITED โ don't proceed as-is; mitigation provided.
Privacy
Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model
runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.
๐ฏ Mode Two ways to use the tool. Ask: free-form question, browser LLM picks the right recipe. Recipe: manual selection with full form control.
Same result either way โ pick whichever fits your style.
Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
in-browser LLM picks the right recipe and runs it.
โ Your question
๐ Recipe
๐ฏ Inputs
๐ Verdict
๐ Computation Chain
Every number below is deterministic Python. Click a step to expand.