🔬 TAF Agent

Transformer diagnostic in your browser. Free. Unlimited. Auditable.

All computation happens locally — your data never leaves this page.

📘 TAF Agent — User Manual

What does it do?

Predicts practical viability of any transformer LLM before you spend GPU/$. Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using deterministic Python formulas (TAF — Thermodynamic Attention Framework).

How to use — 2 modes

💬 Ask in plain English (default): type your question, the in-browser LLM picks the right recipe and runs it. Best for casual exploration.

📋 Pick recipe + form: select a recipe manually, fill the parameters, run. Best when you want full control or know exactly what you need.

The 5 recipes available

X-1 Custom training vs API — compares cost of training your own model vs paying for API access.

Try: "Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"
Answer types: YES (custom) / NO (API) with break-even months.

X-2 Long Context Viability — predicts if a model serves a target context length reliably.

Try: "Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"
Chains: γ_Padé → decomposition → d_horizon → NIAH ceiling → hallucination → KV memory.
Verdict: YES / DEGRADED / NO with mitigation if needed.

X-3 Budget pre-flight — given $ budget, what model is feasible to train?

Try: "I have $5000, what model can I train?"
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).

X-5 Hardware selection — which GPU should I use to serve at target throughput?

Try: "Cheapest hardware to serve Llama-3-8B at 10M tokens/day"
Answer: best GPU + $/Mtok + capacity vs target.

X-19 KV Compression decision — should I use soft decay, hard cutoff, or literature methods?

Try: "How to compress KV cache for Qwen2.5-7B at 32K?"
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.

Adding new models

Preset list: 11 popular models curated. Just select from dropdown.
HF Hub fetch: paste any model id (e.g. Qwen/Qwen2.5-32B-Instruct), click 📥 Fetch. Browser downloads config.json directly from HuggingFace, fills the form. Works for any public model.
Manual: fill the form fields directly with values from the model card.

The audit chain

Every result shows the full Computation Chain — each formula step with its inputs, output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer to the underlying paper for derivation.

The plain-English answer

After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load) synthesizes a plain-English summary. The numbers above are always correct (deterministic Python); the synthesis is LLM-generated — verify against the chain if in doubt.

Common parameters explained

θ (rope_theta): RoPE base frequency. Higher = more long-range capacity. Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).
T_train: max context the model was trained on. From max_position_embeddings.
T_eval: your target inference context length. The key knob.
n_kv_heads < n_attention_heads: model uses GQA (Grouped Query Attention). Reduces KV memory but pushes γ toward Hagedorn.
has_SWA: model uses Sliding Window Attention (Mistral, gemma-2).
n_params: total parameter count. Threshold ~400M for induction-head emergence.

What to look for in verdicts

YES / GO — proceed with confidence; numbers support the choice.
DEGRADED / TINY-MODEL — works but with caveats; read the action.
NO / MEMORY-LIMITED — don't proceed as-is; mitigation provided.

Privacy

Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.

Source & paper

Source code: github.com/karlesmarin/tafagent
Paper: Marin 2026 — Transformer Thermodynamics (arXiv forthcoming)

⏳ Loading Python runtime...

🎯 Mode Two ways to use the tool.
Ask: free-form question, browser LLM picks the right recipe.
Recipe: manual selection with full form control.
Same result either way — pick whichever fits your style.

Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The in-browser LLM picks the right recipe and runs it.

🔬 TAF Agent

🎯 Mode Two ways to use the tool.
Ask: free-form question, browser LLM picks the right recipe.
Recipe: manual selection with full form control.
Same result either way — pick whichever fits your style.

❓ Your question

📋 Recipe

🎯 Inputs

📊 Verdict

🔍 Computation Chain

💬 Plain-English Answer

📘 TAF Agent — User Manual

What does it do?

How to use — 2 modes

The 5 recipes available

Adding new models

The audit chain

The plain-English answer

Common parameters explained

What to look for in verdicts

Privacy

Source & paper

🎯 Mode Two ways to use the tool. Ask: free-form question, browser LLM picks the right recipe. Recipe: manual selection with full form control. Same result either way — pick whichever fits your style.

❓ Your question

📋 Recipe

🎯 Inputs

📊 Verdict

🔍 Computation Chain

💬 Plain-English Answer

🎯 Mode Two ways to use the tool.
Ask: free-form question, browser LLM picks the right recipe.
Recipe: manual selection with full form control.
Same result either way — pick whichever fits your style.