Spaces:

karlexmarin
/

taf-agent

Running

App Files Files Community

taf-agent / index.html

karlexmarin

feat(ui): info tooltips, help modal, more visible verdict box

d0a945b about 2 months ago

raw

history blame

10.2 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8" />
	<meta name="viewport" content="width=device-width, initial-scale=1.0" />
	<title>TAF Agent — Transformer Diagnostic in your Browser</title>
	<meta name="description" content="Predict transformer LLM behaviour from config alone. Free, unlimited, runs entirely in your browser." />
	<link rel="stylesheet" href="style.css" />
	<script src="https://cdn.jsdelivr.net/pyodide/v0.26.4/full/pyodide.js"></script>
	</head>
	<body>
	<header>
	<h1>🔬 TAF Agent</h1>
	<p class="tagline">
	Transformer diagnostic in your browser. <strong>Free. Unlimited. Auditable.</strong>
	</p>
	<p class="subtle">
	All computation happens locally — your data never leaves this page.
	</p>
	<p style="margin-top: 0.75rem;">
	<button id="help-btn" type="button">📘 Help & examples</button>
	</p>
	</header>

	<!-- Help modal -->
	<div id="help-modal">
	<div class="help-content">
	<button class="help-close" id="help-close">×</button>
	<h2>📘 TAF Agent — User Manual</h2>

	<h3>What does it do?</h3>
	<p>Predicts <strong>practical viability</strong> of any transformer LLM <em>before you spend GPU/$</em>.
	Answers questions like "will this model work at L=32K?" or "should I train custom or use API?" using
	deterministic Python formulas (TAF — Thermodynamic Attention Framework).</p>

	<h3>How to use — 2 modes</h3>
	<p><strong>💬 Ask in plain English</strong> (default): type your question, the in-browser LLM picks
	the right recipe and runs it. Best for casual exploration.</p>
	<p><strong>📋 Pick recipe + form</strong>: select a recipe manually, fill the parameters, run.
	Best when you want full control or know exactly what you need.</p>

	<h3>The 5 recipes available</h3>

	<p><strong>X-1 Custom training vs API</strong> — compares cost of training your own model vs paying for API access.</p>
	<div class="help-example">
	Try: <em>"Should I train an 8B custom model or use GPT-4o for 50M tokens/month?"</em><br>
	Answer types: YES (custom) / NO (API) with break-even months.
	</div>

	<p><strong>X-2 Long Context Viability</strong> — predicts if a model serves a target context length reliably.</p>
	<div class="help-example">
	Try: <em>"Will Meta-Llama-3-8B handle 32000 tokens for retrieval?"</em><br>
	Chains: γ_Padé → decomposition → d_horizon → NIAH ceiling → hallucination → KV memory.<br>
	Verdict: YES / DEGRADED / NO with mitigation if needed.
	</div>

	<p><strong>X-3 Budget pre-flight</strong> — given $ budget, what model is feasible to train?</p>
	<div class="help-example">
	Try: <em>"I have $5000, what model can I train?"</em><br>
	Answer: GO / TINY-MODEL / MEMORY-LIMITED with concrete N (params) and D (tokens).
	</div>

	<p><strong>X-5 Hardware selection</strong> — which GPU should I use to serve at target throughput?</p>
	<div class="help-example">
	Try: <em>"Cheapest hardware to serve Llama-3-8B at 10M tokens/day"</em><br>
	Answer: best GPU + $/Mtok + capacity vs target.
	</div>

	<p><strong>X-19 KV Compression decision</strong> — should I use soft decay, hard cutoff, or literature methods?</p>
	<div class="help-example">
	Try: <em>"How to compress KV cache for Qwen2.5-7B at 32K?"</em><br>
	Answer: USE SOFT DECAY / USE D_f CUTOFF / USE LITERATURE METHODS / USE HARD T_train.
	</div>

	<h3>Adding new models</h3>
	<ul>
	<li><strong>Preset list</strong>: 11 popular models curated. Just select from dropdown.</li>
	<li><strong>HF Hub fetch</strong>: paste any model id (e.g. <code>Qwen/Qwen2.5-32B-Instruct</code>),
	click 📥 Fetch. Browser downloads <code>config.json</code> directly from HuggingFace,
	fills the form. Works for any public model.</li>
	<li><strong>Manual</strong>: fill the form fields directly with values from the model card.</li>
	</ul>

	<h3>The audit chain</h3>
	<p>Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
	output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
	to the underlying paper for derivation.</p>

	<h3>The plain-English answer</h3>
	<p>After the deterministic chain runs, an in-browser LLM (Qwen2.5-0.5B, ~350MB cached after first load)
	synthesizes a plain-English summary. The numbers above are <em>always correct</em> (deterministic Python);
	the synthesis is LLM-generated — verify against the chain if in doubt.</p>

	<h3>Common parameters explained</h3>
	<ul>
	<li><strong>θ (rope_theta)</strong>: RoPE base frequency. Higher = more long-range capacity.
	Typical: 10000 (early), 500000 (Llama-3), 1000000 (Qwen2.5).</li>
	<li><strong>T_train</strong>: max context the model was trained on. From <code>max_position_embeddings</code>.</li>
	<li><strong>T_eval</strong>: <em>your target</em> inference context length. The key knob.</li>
	<li><strong>n_kv_heads < n_attention_heads</strong>: model uses GQA (Grouped Query Attention).
	Reduces KV memory but pushes γ toward Hagedorn.</li>
	<li><strong>has_SWA</strong>: model uses Sliding Window Attention (Mistral, gemma-2).</li>
	<li><strong>n_params</strong>: total parameter count. Threshold ~400M for induction-head emergence.</li>
	</ul>

	<h3>What to look for in verdicts</h3>
	<ul>
	<li><strong style="color:#3fb950;">YES / GO</strong> — proceed with confidence; numbers support the choice.</li>
	<li><strong style="color:#d29922;">DEGRADED / TINY-MODEL</strong> — works but with caveats; read the action.</li>
	<li><strong style="color:#f85149;">NO / MEMORY-LIMITED</strong> — don't proceed as-is; mitigation provided.</li>
	</ul>

	<h3>Privacy</h3>
	<p>Everything runs in your browser. No telemetry, no analytics, no data sent anywhere. Even the LLM model
	runs locally via WebGPU/WebAssembly. Your model_ids and questions never leave this page.</p>

	<h3>Source & paper</h3>
	<p>Source code: <a href="https://github.com/karlesmarin/tafagent" target="_blank">github.com/karlesmarin/tafagent</a><br>
	Paper: <em>Marin 2026 — Transformer Thermodynamics</em> (arXiv forthcoming)</p>
	</div>
	</div>

	<main>
	<!-- Status -->
	<section id="status-bar"><div id="status">⏳ Loading Python runtime...</div></section>

	<!-- Mode toggle -->
	<section id="mode-section">
	<h2>🎯 Mode <span class="info"><span class="tooltip"><strong>Two ways to use the tool</strong>.<br>
	<strong>Ask</strong>: free-form question, browser LLM picks the right recipe.<br>
	<strong>Recipe</strong>: manual selection with full form control.<br>
	Same result either way — pick whichever fits your style.
	</span></span></h2>
	<div class="mode-tabs">
	<button class="mode-btn active" data-mode="ask">💬 Ask in plain English</button>
	<button class="mode-btn" data-mode="recipe">📋 Pick recipe + fill form</button>
	</div>
	<p id="mode-desc" class="recipe-desc">
	Type a free-form question (e.g. "Will Llama-3-8B work at 32K context?"). The
	in-browser LLM picks the right recipe and runs it.
	</p>
	</section>

	<!-- Free-form question (mode=ask) -->
	<section id="ask-section">
	<h2>❓ Your question</h2>
	<textarea id="question" rows="3" placeholder="e.g. Will Mistral-7B handle 16K NIAH retrieval? Or: I have $5,000, what model can I train? Or: Cheapest GPU to serve Llama-70B at 100M tokens/day?"></textarea>
	<div style="display:flex; gap:0.5rem; margin-top:0.5rem; flex-wrap:wrap;">
	<button id="ask-btn" disabled>🚀 Analyze</button>
	<button id="example-btn" type="button" class="secondary">💡 Try an example</button>
	</div>
	</section>

	<!-- Recipe selector (mode=recipe) -->
	<section id="recipe-section" style="display:none;">
	<h2>📋 Recipe</h2>
	<select id="recipe-select" disabled>
	<option value="">— select a recipe —</option>
	</select>
	<p id="recipe-desc-display" class="recipe-desc"></p>
	</section>

	<!-- Form (mode=recipe) -->
	<section id="form-section" style="display:none;">
	<h2>🎯 Inputs</h2>

	<div class="form-row">
	<label for="preset">Preset model:</label>
	<select id="preset" disabled>
	<option value="">— select to autofill —</option>
	</select>
	</div>

	<div class="form-row">
	<label for="hf-id">Or any HF model:</label>
	<input type="text" id="hf-id" placeholder="e.g. Qwen/Qwen2.5-32B-Instruct" style="flex:1;" />
	<button id="hf-fetch-btn" type="button" class="secondary">📥 Fetch</button>
	</div>
	<div id="hf-status" class="subtle" style="margin: -0.5rem 0 1rem; min-height:1.2em;"></div>

	<!-- Dynamic form fields based on recipe -->
	<div id="dynamic-form" class="form-grid"></div>

	<button id="run-btn" disabled>🚀 Analyze</button>
	</section>

	<!-- Output -->
	<section id="output-section" style="display:none;">
	<h2>📊 Verdict</h2>
	<div id="verdict-box"></div>

	<h2>🔍 Computation Chain</h2>
	<p class="subtle">Every number below is deterministic Python. Click a step to expand.</p>
	<div id="chain-box"></div>

	<h2 id="answer-header" style="display:none;">💬 Plain-English Answer</h2>
	<div id="answer-box" style="display:none;"></div>
	</section>
	</main>

	<footer>
	<p>
	© 2026 Carles Marin · Apache-2.0 ·
	<a href="https://github.com/karlesmarin/tafagent" target="_blank">Source on GitHub</a>
	</p>
	<p class="subtle">
	Computation: Pyodide (Python in browser) · Synthesis: WebLLM (Llama-3.2-1B local) · Hosting: GitHub Pages
	</p>
	</footer>

	<script type="module" src="js/main.js"></script>
	</body>
	</html>