Spaces:

karlexmarin
/

taf-agent

Running

App Files Files Community

taf-agent / hf-discussion-templates.md

karlexmarin

docs: rename paper "Transformer Thermodynamics" → "Predicting How Transformers Attend"

1a6c909 about 2 months ago

preview code

raw

history blame

6.75 kB

HF Discussion templates

Post in Community → Discussions → New Discussion of each model. Use a variant — don't copy-paste identical text (looks like spam).

Template 1 — Llama-3-8B / Llama-3.3-70B-Instruct

Title: TAF Agent: I built a free browser tool that predicts this model's long-context viability

Body:

Hi! I built TAF Agent, a free in-browser diagnostic for transformer LLMs.

I used it on this model and the prediction was:
[paste your X-2 verdict here, e.g. "YES at 32K with 33% margin, but DEGRADED at 64K"]

You can verify on your own model in 30s:
https://huggingface.co/spaces/karlexmarin/taf-agent
→ Profile mode → paste this model's id → Generate

Curious if anyone has measured NIAH retrieval on this model at long
contexts and if the predictions match. Falsifications welcome:
https://github.com/karlesmarin/tafagent-registry/issues

Built solo by an independent researcher; open source Apache-2.0;
$0/month forever (browser-side compute).

Template 2 — Mistral-7B / Mistral-Small-3.1

Title: Tested this model in TAF Agent — interesting result on KV compression

Body:

Hey, I built a small browser tool that predicts viability of transformer
LLMs from their config. Ran it on this model:

X-2 (long context): [your verdict]
X-19 (KV compression): [your verdict — soft decay applies?]

The interesting part is that γ_Padé = [value] places this model in the
[Phase A / Phase B / borderline] regime per the underlying paper
(Marin 2026, "Predicting How Transformers Attend").

Try it: https://huggingface.co/spaces/karlexmarin/taf-agent

If you've measured this model empirically at long context and the
prediction is wrong, I'd love to know — refutations are first-class
citizens here:
https://github.com/karlesmarin/tafagent-registry/issues

Template 3 — Qwen2.5-7B / Qwen2.5-32B / Qwen3

Title: Free browser diagnostic for transformer viability — ran on Qwen2.5

Body:

Built TAF Agent — a browser tool that predicts practical viability of
transformer LLMs (long-context, KV compression, hardware fit, etc.) from
config alone.

Ran it on this model. Quick observations:
- γ_Padé(T=32K) = [value] → [Phase classification]
- d_horizon = [value]
- For NIAH retrieval at 32K: [verdict]

Qwen2.5 has interesting design choices (high rope_theta, low n_kv) that
the framework analyzes nicely.

Tool URL: https://huggingface.co/spaces/karlexmarin/taf-agent
Source: https://github.com/karlesmarin/tafagent

If you've actually measured long-context retrieval on this model and the
prediction is off, please open a falsification issue:
https://github.com/karlesmarin/tafagent-registry

Template 4 — Phi-3-mini / Phi-4

Title: TAF Agent diagnostic for this model

Body:

Tried this model in TAF Agent (browser-based viability diagnostic):

- Architecture class: [classification]
- Long-context verdict at [your target T]: [verdict]
- KV compression strategy: [recommendation]

This is a small/edge-friendly model — TAF identifies that it's well-suited
for [your context range].

Try it on your own deployment scenario:
https://huggingface.co/spaces/karlexmarin/taf-agent

100% browser-side, no auth, no rate limits, no cost.

Template 5 — gemma-2-9b-it / gemma-2-27b-it

Title: Gemma's SWA architecture in TAF Agent — interesting Δγ signature

Body:

Built a browser diagnostic for transformer LLMs. Gemma family is
interesting because of the alternating SWA pattern.

Per the underlying framework (Marin 2026, "Predicting How Transformers Attend"),
SWA gives a distinctive Δγ ≈ +0.5 signature visible in attention
fingerprinting.

For this specific model:
- Architecture detected: [class]
- Verdict at [your T]: [verdict]
- KV compression recommendation: [strategy]

Tool: https://huggingface.co/spaces/karlexmarin/taf-agent

Can be useful before deployment to predict context-length behavior.

Template 6 — SmolLM2-1.7B / Llama-3.2-1B (small models)

Title: TAF Agent works on small models too — good for edge inference planning

Body:

Built a free browser diagnostic for transformer LLMs. Just ran it on
this small model.

For edge / mobile / browser inference, the relevant questions are
different (latency-sensitive, memory-constrained). TAF Agent's hardware
recipe (X-5) gives concrete tok/s + $/Mtok numbers across consumer GPUs
and Apple Silicon.

For this model: [verdict on edge feasibility]

Tool: https://huggingface.co/spaces/karlexmarin/taf-agent

(Bonus: the tool ITSELF runs in browser via WebLLM with a small model.
So if you want to see how a 1B Instruct model handles tool-use synthesis,
it's the synthesis LLM by default.)

Template 7 — DeepSeek-V3 / DeepSeek-V2-Lite

Title: DeepSeek architecture analyzed in TAF Agent

Body:

DeepSeek's MLA (Multi-head Latent Attention) is interesting — TAF Agent
classifies it under the GQA-like family for first-order analysis,
though MLA itself isn't natively in the framework yet.

Ran X-2 on this model: [verdict]
Ran X-1 (custom vs API): [verdict given DeepSeek's pricing]

URL: https://huggingface.co/spaces/karlexmarin/taf-agent

DeepSeek's API pricing makes interesting math for cost recipes — the
break-even calculations show very different results vs frontier US APIs.

Source: https://github.com/karlesmarin/tafagent

Tips para postear sin parecer spam

Personaliza — cada post menciona algo específico del modelo
Aporta valor — no solo "look at my tool", sino observación concreta del análisis
Pide feedback genuino — preguntas, falsificaciones, confirmaciones
Espacia los posts — no postees los 8 en 10 minutos. Uno cada 2-3h
Responde si comentan — engagement real, no fire-and-forget
No prometas lo que no es — no es benchmark, no es leaderboard
Reconoce los limites del tool — humildad

En qué ORDEN recomiendo postear

Día 1:

HF Posts announcement (template separado)
1-2 model discussions (empezar con SmolLM2 o phi-3 — comunidad menos competitiva)

Día 2-3:

2-3 más (Llama-3-8B, Mistral, Qwen)

Semana 1+:

Engage con comentarios
Submit ANALYSIS results del registry como proof
Ir respondiendo dudas

Si alguien refuta la predicción

¡Genial! Eso es exactamente lo que queremos para validar el framework.

Respuesta tipo:

"Thanks for the falsification — please open an issue in the registry with your setup details so it's permanently logged. The framework is designed to be falsifiable; refutations help us bound validity zones better."

Link: https://github.com/karlesmarin/tafagent-registry/issues/new?template=refutation.md