--- title: TAF Agent emoji: πŸ”¬ colorFrom: blue colorTo: green sdk: static pinned: true license: apache-2.0 short_description: Test ANY transformer LLM before you spend GPU/$. Free. Auditable. tags: - transformer - llm - diagnostic - rope - kv-cache - long-context - gguf - llama-cpp - ollama - yarn - quantization - viability - thermodynamics - free - browser - webgpu language: - en - es - fr - zh --- # πŸ”¬ TAF Agent > **Test ANY transformer LLM before you spend GPU/$.** > Free. Unlimited. Auditable. Runs entirely in your browser. --- ## What it does Predicts practical viability of any transformer LLM from its config alone: - Will Llama-3-8B serve 32K context with NIAH retrieval? - Should I train a custom 7B model or use GPT-4o for 50M tokens/month? - Cheapest GPU to serve Llama-70B at 100M tokens/day? - Which KV compression strategy fits my model's Ξ³ profile? 25 modes, 4 languages (EN/ES/FR/ZH), 100% in-browser. ## Why it's different - **Truly free**: no server, no auth, no rate limits. Compute runs in YOUR browser. - **Auditable**: every number is deterministic Python (TAF formulas, see paper). No hallucination β€” the LLM only synthesises, doesn't invent values. - **Falsifiable**: 23 paper predictions tracked publicly with verification status. - **Community-first**: submit your analyses to a public registry; debate them. ## Architecture coverage βœ“ RoPE-MHA Β· βœ“ RoPE-GQA Β· βœ“ ALiBi Β· βœ“ AbsPE Β· βœ“ SWA Β· βœ“ SSM Β· βœ“ Any HuggingFace public model ## Modes (25) **Core** - **πŸ“‡ Profile**: paste model id β†’ all 5 recipes scored at once = TAF Card - **πŸ†š Compare**: 2-3 models side-by-side Β· **πŸ” Inspector**: raw config.json Β· **πŸ’¬ Ask**: free-form Β· **πŸ“‹ Recipe**: manual **πŸ†• v0.9 β€” "fits β‰  works" pack** - **🧡 YaRN Planner**: target context β†’ exact `rope_scaling` config + Ξ³/d_horizon verdict on whether quality holds - **🧊 GGUF Bridge**: paste a GGUF repo β†’ reads the `.gguf` header (HTTP Range, no full download), compares every quant's Ξ³-shift - **πŸš€ Launch Flags**: model + GPU + context β†’ exact `llama.cpp`/Ollama command (`-ngl`, `-c`, `--no-mmap`, KV-type) + wasted-context warning **Anti-bullshit pack (v0.7–v0.8)** - πŸͺŸ Unmask Β· πŸ“œ Chat-template Β· 🎯 Arena CI Β· πŸ§ͺ Contamination Β· βš–οΈ Quant Β· πŸ”€ Drift Β· πŸ” NIAHβ†’Reason - πŸ“ˆ Saturation Β· πŸ“‹ JSON CoT Β· πŸ”§ PEFT Lint Β· πŸ” Cache Diff Β· πŸ”¬ Spec-Decode Β· 🌍 Token Tax Β· 🎯 LongScore Β· 🧭 Solutions Β· 🩺 Diagnose CLI Β· πŸ“Š Phase diagram ## Underlying paper [Marin 2026 β€” Predicting How Transformers Attend](https://zenodo.org/records/20314038) ## Source [github.com/karlesmarin/tafagent](https://github.com/karlesmarin/tafagent) ## Public registry [tafagent-registry](https://github.com/karlesmarin/tafagent-registry) β€” community-submitted analyses ## Citation ```bibtex @misc{marin2026tafagent, author = {Marin, Carles}, title = {{TAF Agent}: Browser-Based Transformer Diagnostic Tool}, year = {2026}, url = {https://huggingface.co/spaces/karlexmarin/taf-agent}, } ``` ## Acknowledgements Built by an independent researcher with the help of LLMs as research instruments. Not affiliated with any model vendor. The tool would not exist without the open-weights commons (Meta, Mistral, Qwen, EleutherAI, AI2, BigScience, TII, DeepSeek, Microsoft, Google DeepMind, Anthropic), the Pyodide + WebLLM projects, and HuggingFace for hosting models, datasets, and now this Space.