๐Ÿ”ฌ TAF Agent

Transformer diagnostic in your browser. Free. Unlimited. Auditable.

All computation happens locally โ€” your data never leaves this page.

๐Ÿ“˜ TAF Agent โ€” User Manual

What does it do?

Predicts practical viability of any transformer LLM before you spend GPU/$. Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using deterministic Python formulas (TAF โ€” Thermodynamic Attention Framework).

How to use โ€” 2 modes

๐Ÿ’ฌ Ask in plain English (default): type your question, the in-browser LLM picks the right recipe and runs it. Best for casual exploration.

๐Ÿ“‹ Pick recipe + form: select a recipe manually, fill the parameters, run. Best when you want full control or know exactly what you need.

The 5 recipes available

X-1 Custom training vs API โ€” compares cost of training your own model vs paying for API access.

Try: "Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"
Answer types: YES (custom) / NO (API) with break-even months.

X-2 Long Context Viability โ€” predicts if a model serves a target context length reliably.

Try: "Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"
Chains: ฮณ_Padรฉ โ†’ decomposition โ†’ d_horizon โ†’ NIAH ceiling โ†’ hallucination โ†’ KV memory.
Verdict: YES / DEGRADED / NO with mitigation if needed.

X-3 Budget pre-flight โ€” given $ budget, what model is feasible to train?

Try: "I have $5000, what model can I train?"
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).

X-5 Hardware selection โ€” which GPU should I use to serve at target throughput?

Try: "Cheapest hardware to serve Llama-3-8B at 10M tokens/day"
Answer: best GPU + $/Mtok + capacity vs target.

X-19 KV Compression decision โ€” should I use soft decay, hard cutoff, or literature methods?

Try: "How to compress KV cache for Qwen2.5-7B at 32K?"
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.

Adding new models

The audit chain

Every result shows the full Computation Chain โ€” each formula step with its inputs, output, and interpretation. Click any step to expand. Cite section numbers (ยง26.1, ยง19.1, etc.) refer to the underlying paper for derivation.

The plain-English answer

After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load) synthesizes a plain-English summary. The numbers above are always correct (deterministic Python); the synthesis is LLM-generated โ€” verify against the chain if in doubt.

Common parameters explained

What to look for in verdicts

Privacy

Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.

Source & paper

Source code: github.com/karlesmarin/tafagent
Paper: Marin 2026 โ€” Transformer Thermodynamics (arXiv forthcoming)

โณ Loading Python runtime...

๐ŸŽฏ Mode Two ways to use the tool.
Ask: free-form question, browser LLM picks the right recipe.
Recipe: manual selection with full form control.
Same result either way โ€” pick whichever fits your style.

Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The in-browser LLM picks the right recipe and runs it.

โ“ Your question