Spaces:

karlexmarin
/

taf-agent

Running

App Files Files Community

taf-agent / hf-post-announcement.md

karlexmarin

docs: rename paper "Transformer Thermodynamics" → "Predicting How Transformers Attend"

1a6c909 about 2 months ago

preview code

raw

history blame

3.22 kB

	# 🔬 TAF Agent — predict transformer LLM viability before you spend GPU/$

	Just shipped TAF Agent, a free browser-based diagnostic tool for transformer LLMs.
	No server, no auth, no cost. Runs entirely in your browser.

	🌐 Try it: https://huggingface.co/spaces/karlexmarin/taf-agent
	📦 Source: https://github.com/karlesmarin/tafagent
	📄 Paper: [Predicting How Transformers Attend](https://zenodo.org/records/19826343)

	## What it answers

	- Will Llama-3-8B serve 32K context with NIAH retrieval? ← X-2 recipe
	- Should I train custom or use GPT-4o for 50M tokens/month? ← X-1 recipe
	- I have $5K — what model can I afford to train? ← X-3 recipe
	- Cheapest GPU to serve Llama-70B at 100M tokens/day? ← X-5 recipe
	- Soft KV decay or hard cutoff at 32K? ← X-19 recipe

	5 cross-section recipes, 5 UI modes, 4 languages (EN/ES/FR/ZH).

	## Why it's different from "ask ChatGPT"

	Every number is deterministic Python (the TAF formulas — closed-form, derivable
	from RoPE aliasing geometry). No hallucination. The synthesis LLM only reads the
	chain and writes plain English; it doesn't invent values.

	The full computation chain is auditable per click — every step shows formula,
	inputs, output, paper section reference.

	## Architecture coverage

	✓ RoPE-MHA · ✓ RoPE-GQA · ✓ ALiBi · ✓ AbsPE · ✓ SWA · ✓ SSM
	✓ Any HuggingFace public model (paste model id, fetch config.json, profile)

	## How it stays free + unlimited

	- Static HTML/JS on GitHub Pages (unlimited bandwidth)
	- Python computation in your browser via Pyodide
	- Plain-English synthesis via WebLLM (Qwen2.5-0.5B local, your GPU)
	- Configs fetched directly from HF Hub
	- Your data never leaves your browser

	If 1 user or 1M users hit it, our cost stays at $0/month.

	## Built by an independent researcher

	No funding, no team, no GPUs beyond a single consumer card. Built with the
	help of large language models as research instruments. Open source. Apache-2.0.

	The tool exists because the paper it complements needed a way for any reader
	to check the framework's predictions on their own model in seconds.

	## Looking for

	- 🧪 Falsifications: run TAF Agent on a model where you have real
	measurements. If our verdict disagrees, please open a [refutation issue](https://github.com/karlesmarin/tafagent-registry/issues/new?template=refutation.md).
	- 🌐 Translations: 4 languages so far. Add yours via PR (`js/i18n.js`).
	- 💡 New recipes: we shipped 5 of 20 candidate recipes from the paper.
	Propose more in the [registry](https://github.com/karlesmarin/tafagent-registry).
	- ➕ Model presets: 11 popular models curated. Add yours.

	## What this is NOT

	- Not a benchmark (we predict from config, don't measure)
	- Not a leaderboard (no ranking, just per-model viability)
	- Not a replacement for actual evaluation — prediction before measurement
	- Not a vendor pitch — there's nothing to buy, ever

	The point is to give the community a free, auditable, falsifiable lens for
	evaluating transformer LLMs before spending compute on them.

	If you find it useful even once, that's enough.

	#transformer #llm #rope #diagnostic #free #opensource