taf-agent / index.html
karlexmarin's picture
feat(ui): info tooltips, help modal, more visible verdict box
d0a945b
raw
history blame
10.2 kB
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>TAF Agent — Transformer Diagnostic in your Browser</title>
<meta name="description" content="Predict transformer LLM behaviour from config alone. Free, unlimited, runs entirely in your browser." />
<link rel="stylesheet" href="style.css" />
<script src="https://cdn.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js"></script>
</head>
<body>
<header>
<h1>🔬 TAF Agent</h1>
<p class="tagline">
Transformer diagnostic in your browser. <strong>Free. Unlimited. Auditable.</strong>
</p>
<p class="subtle">
All computation happens locally — your data never leaves this page.
</p>
<p style="margin-top: 0.75rem;">
<button id="help-btn" type="button">📘 Help & examples</button>
</p>
</header>
<!-- Help modal -->
<div id="help-modal">
<div class="help-content">
<button class="help-close" id="help-close">×</button>
<h2>📘 TAF Agent — User Manual</h2>
<h3>What does it do?</h3>
<p>Predicts <strong>practical viability</strong> of any transformer LLM <em>before you spend GPU/$</em>.
Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using
deterministic Python formulas (TAF — Thermodynamic Attention Framework).</p>
<h3>How to use — 2 modes</h3>
<p><strong>💬 Ask in plain English</strong> (default): type your question, the in-browser LLM picks
the right recipe and runs it. Best for casual exploration.</p>
<p><strong>📋 Pick recipe + form</strong>: select a recipe manually, fill the parameters, run.
Best when you want full control or know exactly what you need.</p>
<h3>The 5 recipes available</h3>
<p><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
<div class="help-example">
Try: <em>"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"</em><br>
Answer types: YES (custom) / NO (API) with break-even months.
</div>
<p><strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.</p>
<div class="help-example">
Try: <em>"Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"</em><br>
Chains: γ_Padé → decomposition → d_horizon → NIAH ceiling → hallucination → KV memory.<br>
Verdict: YES / DEGRADED / NO with mitigation if needed.
</div>
<p><strong>X-3 Budget pre-flight</strong> — given $ budget, what model is feasible to train?</p>
<div class="help-example">
Try: <em>"I have $5000, what model can I train?"</em><br>
Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).
</div>
<p><strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?</p>
<div class="help-example">
Try: <em>"Cheapest hardware to serve Llama-3-8B at 10M tokens/day"</em><br>
Answer: best GPU + $/Mtok + capacity vs target.
</div>
<p><strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?</p>
<div class="help-example">
Try: <em>"How to compress KV cache for Qwen2.5-7B at 32K?"</em><br>
Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
</div>
<h3>Adding new models</h3>
<ul>
<li><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
<li><strong>HF Hub fetch</strong>: paste any model id (e.g. <code>Qwen/Qwen2.5-32B-Instruct</code>),
click 📥 Fetch. Browser downloads <code>config.json</code> directly from HuggingFace,
fills the form. Works for any public model.</li>
<li><strong>Manual</strong>: fill the form fields directly with values from the model card.</li>
</ul>
<h3>The audit chain</h3>
<p>Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
to the underlying paper for derivation.</p>
<h3>The plain-English answer</h3>
<p>After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load)
synthesizes a plain-English summary. The numbers above are <em>always correct</em> (deterministic Python);
the synthesis is LLM-generated — verify against the chain if in doubt.</p>
<h3>Common parameters explained</h3>
<ul>
<li><strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity.
Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).</li>
<li><strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.</li>
<li><strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.</li>
<li><strong>n_kv_heads &lt; n_attention_heads</strong>: model uses GQA (Grouped Query Attention).
Reduces KV memory but pushes γ toward Hagedorn.</li>
<li><strong>has_SWA</strong>: model uses Sliding Window Attention (Mistral, gemma-2).</li>
<li><strong>n_params</strong>: total parameter count. Threshold ~400M for induction-head emergence.</li>
</ul>
<h3>What to look for in verdicts</h3>
<ul>
<li><strong style="color:#3fb950;">YES / GO</strong> — proceed with confidence; numbers support the choice.</li>
<li><strong style="color:#d29922;">DEGRADED / TINY-MODEL</strong> — works but with caveats; read the action.</li>
<li><strong style="color:#f85149;">NO / MEMORY-LIMITED</strong> — don't proceed as-is; mitigation provided.</li>
</ul>
<h3>Privacy</h3>
<p>Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model
runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.</p>
<h3>Source & paper</h3>
<p>Source code: <a href="https://github.com/karlesmarin/tafagent" target="_blank">github.com/karlesmarin/tafagent</a><br>
Paper: <em>Marin 2026 — Transformer Thermodynamics</em> (arXiv forthcoming)</p>
</div>
</div>
<main>
<!-- Status -->
<section id="status-bar"><div id="status">⏳ Loading Python runtime...</div></section>
<!-- Mode toggle -->
<section id="mode-section">
<h2>🎯 Mode <span class="info"><span class="tooltip"><strong>Two ways to use the tool</strong>.<br>
<strong>Ask</strong>: free-form question, browser LLM picks the right recipe.<br>
<strong>Recipe</strong>: manual selection with full form control.<br>
Same result either way — pick whichever fits your style.
</span></span></h2>
<div class="mode-tabs">
<button class="mode-btn active" data-mode="ask">💬 Ask in plain English</button>
<button class="mode-btn" data-mode="recipe">📋 Pick recipe + fill form</button>
</div>
<p id="mode-desc" class="recipe-desc">
Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
in-browser LLM picks the right recipe and runs it.
</p>
</section>
<!-- Free-form question (mode=ask) -->
<section id="ask-section">
<h2>❓ Your question</h2>
<textarea id="question" rows="3" placeholder="e.g. Will Mistral-7B handle 16K NIAH retrieval? Or: I have $5,000, what model can I train? Or: Cheapest GPU to serve Llama-70B at 100M tokens/day?"></textarea>
<div style="display:flex; gap:0.5rem; margin-top:0.5rem; flex-wrap:wrap;">
<button id="ask-btn" disabled>🚀 Analyze</button>
<button id="example-btn" type="button" class="secondary">💡 Try an example</button>
</div>
</section>
<!-- Recipe selector (mode=recipe) -->
<section id="recipe-section" style="display:none;">
<h2>📋 Recipe</h2>
<select id="recipe-select" disabled>
<option value="">— select a recipe —</option>
</select>
<p id="recipe-desc-display" class="recipe-desc"></p>
</section>
<!-- Form (mode=recipe) -->
<section id="form-section" style="display:none;">
<h2>🎯 Inputs</h2>
<div class="form-row">
<label for="preset">Preset model:</label>
<select id="preset" disabled>
<option value="">— select to autofill —</option>
</select>
</div>
<div class="form-row">
<label for="hf-id">Or any HF model:</label>
<input type="text" id="hf-id" placeholder="e.g. Qwen/Qwen2.5-32B-Instruct" style="flex:1;" />
<button id="hf-fetch-btn" type="button" class="secondary">📥 Fetch</button>
</div>
<div id="hf-status" class="subtle" style="margin: -0.5rem 0 1rem; min-height:1.2em;"></div>
<!-- Dynamic form fields based on recipe -->
<div id="dynamic-form" class="form-grid"></div>
<button id="run-btn" disabled>🚀 Analyze</button>
</section>
<!-- Output -->
<section id="output-section" style="display:none;">
<h2>📊 Verdict</h2>
<div id="verdict-box"></div>
<h2>🔍 Computation Chain</h2>
<p class="subtle">Every number below is deterministic Python. Click a step to expand.</p>
<div id="chain-box"></div>
<h2 id="answer-header" style="display:none;">💬 Plain-English Answer</h2>
<div id="answer-box" style="display:none;"></div>
</section>
</main>
<footer>
<p>
© 2026 Carles Marin · Apache-2.0 ·
<a href="https://github.com/karlesmarin/tafagent" target="_blank">Source on GitHub</a>
</p>
<p class="subtle">
Computation: Pyodide (Python in browser) · Synthesis: WebLLM (Llama-3.2-1B local) · Hosting: GitHub Pages
</p>
</footer>
<script type="module" src="js/main.js"></script>
</body>
</html>