Spaces:

karlexmarin
/

taf-agent

Running

App Files Files Community

taf-agent / hf-discussion-templates.md

karlexmarin

docs: rename paper "Transformer Thermodynamics" → "Predicting How Transformers Attend"

1a6c909 about 2 months ago

preview code

Raw

History Blame

6.75 kB

	# HF Discussion templates

	Post in Community → Discussions → New Discussion of each model.
	Use a variant — don't copy-paste identical text (looks like spam).

	---

	## Template 1 — Llama-3-8B / Llama-3.3-70B-Instruct

	Title: TAF Agent: I built a free browser tool that predicts this model's long-context viability

	Body:
	```
	Hi! I built TAF Agent, a free in-browser diagnostic for transformer LLMs.

	I used it on this model and the prediction was:
	[paste your X-2 verdict here, e.g. "YES at 32K with 33% margin, but DEGRADED at 64K"]

	You can verify on your own model in 30s:
	https://huggingface.co/spaces/karlexmarin/taf-agent
	→ Profile mode → paste this model's id → Generate

	Curious if anyone has measured NIAH retrieval on this model at long
	contexts and if the predictions match. Falsifications welcome:
	https://github.com/karlesmarin/tafagent-registry/issues

	Built solo by an independent researcher; open source Apache-2.0;
	$0/month forever (browser-side compute).
	```

	---

	## Template 2 — Mistral-7B / Mistral-Small-3.1

	Title: Tested this model in TAF Agent — interesting result on KV compression

	Body:
	```
	Hey, I built a small browser tool that predicts viability of transformer
	LLMs from their config. Ran it on this model:

	X-2 (long context): [your verdict]
	X-19 (KV compression): [your verdict — soft decay applies?]

	The interesting part is that γ_Padé = [value] places this model in the
	[Phase A / Phase B / borderline] regime per the underlying paper
	(Marin 2026, "Predicting How Transformers Attend").

	Try it: https://huggingface.co/spaces/karlexmarin/taf-agent

	If you've measured this model empirically at long context and the
	prediction is wrong, I'd love to know — refutations are first-class
	citizens here:
	https://github.com/karlesmarin/tafagent-registry/issues
	```

	---

	## Template 3 — Qwen2.5-7B / Qwen2.5-32B / Qwen3

	Title: Free browser diagnostic for transformer viability — ran on Qwen2.5

	Body:
	```
	Built TAF Agent — a browser tool that predicts practical viability of
	transformer LLMs (long-context, KV compression, hardware fit, etc.) from
	config alone.

	Ran it on this model. Quick observations:
	- γ_Padé(T=32K) = [value] → [Phase classification]
	- d_horizon = [value]
	- For NIAH retrieval at 32K: [verdict]

	Qwen2.5 has interesting design choices (high rope_theta, low n_kv) that
	the framework analyzes nicely.

	Tool URL: https://huggingface.co/spaces/karlexmarin/taf-agent
	Source: https://github.com/karlesmarin/tafagent

	If you've actually measured long-context retrieval on this model and the
	prediction is off, please open a falsification issue:
	https://github.com/karlesmarin/tafagent-registry
	```

	---

	## Template 4 — Phi-3-mini / Phi-4

	Title: TAF Agent diagnostic for this model

	Body:
	```
	Tried this model in TAF Agent (browser-based viability diagnostic):

	- Architecture class: [classification]
	- Long-context verdict at [your target T]: [verdict]
	- KV compression strategy: [recommendation]

	This is a small/edge-friendly model — TAF identifies that it's well-suited
	for [your context range].

	Try it on your own deployment scenario:
	https://huggingface.co/spaces/karlexmarin/taf-agent

	100% browser-side, no auth, no rate limits, no cost.
	```

	---

	## Template 5 — gemma-2-9b-it / gemma-2-27b-it

	Title: Gemma's SWA architecture in TAF Agent — interesting Δγ signature

	Body:
	```
	Built a browser diagnostic for transformer LLMs. Gemma family is
	interesting because of the alternating SWA pattern.

	Per the underlying framework (Marin 2026, "Predicting How Transformers Attend"),
	SWA gives a distinctive Δγ ≈ +0.5 signature visible in attention
	fingerprinting.

	For this specific model:
	- Architecture detected: [class]
	- Verdict at [your T]: [verdict]
	- KV compression recommendation: [strategy]

	Tool: https://huggingface.co/spaces/karlexmarin/taf-agent

	Can be useful before deployment to predict context-length behavior.
	```

	---

	## Template 6 — SmolLM2-1.7B / Llama-3.2-1B (small models)

	Title: TAF Agent works on small models too — good for edge inference planning

	Body:
	```
	Built a free browser diagnostic for transformer LLMs. Just ran it on
	this small model.

	For edge / mobile / browser inference, the relevant questions are
	different (latency-sensitive, memory-constrained). TAF Agent's hardware
	recipe (X-5) gives concrete tok/s + $/Mtok numbers across consumer GPUs
	and Apple Silicon.

	For this model: [verdict on edge feasibility]

	Tool: https://huggingface.co/spaces/karlexmarin/taf-agent

	(Bonus: the tool ITSELF runs in browser via WebLLM with a small model.
	So if you want to see how a 1B Instruct model handles tool-use synthesis,
	it's the synthesis LLM by default.)
	```

	---

	## Template 7 — DeepSeek-V3 / DeepSeek-V2-Lite

	Title: DeepSeek architecture analyzed in TAF Agent

	Body:
	```
	DeepSeek's MLA (Multi-head Latent Attention) is interesting — TAF Agent
	classifies it under the GQA-like family for first-order analysis,
	though MLA itself isn't natively in the framework yet.

	Ran X-2 on this model: [verdict]
	Ran X-1 (custom vs API): [verdict given DeepSeek's pricing]

	URL: https://huggingface.co/spaces/karlexmarin/taf-agent

	DeepSeek's API pricing makes interesting math for cost recipes — the
	break-even calculations show very different results vs frontier US APIs.

	Source: https://github.com/karlesmarin/tafagent
	```

	---

	## Tips para postear sin parecer spam

	1. Personaliza — cada post menciona algo específico del modelo
	2. Aporta valor — no solo "look at my tool", sino observación concreta del análisis
	3. Pide feedback genuino — preguntas, falsificaciones, confirmaciones
	4. Espacia los posts — no postees los 8 en 10 minutos. Uno cada 2-3h
	5. Responde si comentan — engagement real, no fire-and-forget
	6. No prometas lo que no es — no es benchmark, no es leaderboard
	7. Reconoce los limites del tool — humildad

	## En qué ORDEN recomiendo postear

	Día 1:
	- HF Posts announcement (template separado)
	- 1-2 model discussions (empezar con SmolLM2 o phi-3 — comunidad menos competitiva)

	Día 2-3:
	- 2-3 más (Llama-3-8B, Mistral, Qwen)

	Semana 1+:
	- Engage con comentarios
	- Submit ANALYSIS results del registry como proof
	- Ir respondiendo dudas

	## Si alguien refuta la predicción

	¡Genial! Eso es exactamente lo que queremos para validar el framework.

	Respuesta tipo:
	> "Thanks for the falsification — please open an issue in the registry with your
	> setup details so it's permanently logged. The framework is designed to be
	> falsifiable; refutations help us bound validity zones better."

	Link: https://github.com/karlesmarin/tafagent-registry/issues/new?template=refutation.md