Spaces:
Running
HF Discussion templates
Post in Community → Discussions → New Discussion of each model. Use a variant — don't copy-paste identical text (looks like spam).
Template 1 — Llama-3-8B / Llama-3.3-70B-Instruct
Title: TAF Agent: I built a free browser tool that predicts this model's long-context viability
Body:
Hi! I built TAF Agent, a free in-browser diagnostic for transformer LLMs.
I used it on this model and the prediction was:
[paste your X-2 verdict here, e.g. "YES at 32K with 33% margin, but DEGRADED at 64K"]
You can verify on your own model in 30s:
https://huggingface.co/spaces/karlexmarin/taf-agent
→ Profile mode → paste this model's id → Generate
Curious if anyone has measured NIAH retrieval on this model at long
contexts and if the predictions match. Falsifications welcome:
https://github.com/karlesmarin/tafagent-registry/issues
Built solo by an independent researcher; open source Apache-2.0;
$0/month forever (browser-side compute).
Template 2 — Mistral-7B / Mistral-Small-3.1
Title: Tested this model in TAF Agent — interesting result on KV compression
Body:
Hey, I built a small browser tool that predicts viability of transformer
LLMs from their config. Ran it on this model:
X-2 (long context): [your verdict]
X-19 (KV compression): [your verdict — soft decay applies?]
The interesting part is that γ_Padé = [value] places this model in the
[Phase A / Phase B / borderline] regime per the underlying paper
(Marin 2026, "Predicting How Transformers Attend").
Try it: https://huggingface.co/spaces/karlexmarin/taf-agent
If you've measured this model empirically at long context and the
prediction is wrong, I'd love to know — refutations are first-class
citizens here:
https://github.com/karlesmarin/tafagent-registry/issues
Template 3 — Qwen2.5-7B / Qwen2.5-32B / Qwen3
Title: Free browser diagnostic for transformer viability — ran on Qwen2.5
Body:
Built TAF Agent — a browser tool that predicts practical viability of
transformer LLMs (long-context, KV compression, hardware fit, etc.) from
config alone.
Ran it on this model. Quick observations:
- γ_Padé(T=32K) = [value] → [Phase classification]
- d_horizon = [value]
- For NIAH retrieval at 32K: [verdict]
Qwen2.5 has interesting design choices (high rope_theta, low n_kv) that
the framework analyzes nicely.
Tool URL: https://huggingface.co/spaces/karlexmarin/taf-agent
Source: https://github.com/karlesmarin/tafagent
If you've actually measured long-context retrieval on this model and the
prediction is off, please open a falsification issue:
https://github.com/karlesmarin/tafagent-registry
Template 4 — Phi-3-mini / Phi-4
Title: TAF Agent diagnostic for this model
Body:
Tried this model in TAF Agent (browser-based viability diagnostic):
- Architecture class: [classification]
- Long-context verdict at [your target T]: [verdict]
- KV compression strategy: [recommendation]
This is a small/edge-friendly model — TAF identifies that it's well-suited
for [your context range].
Try it on your own deployment scenario:
https://huggingface.co/spaces/karlexmarin/taf-agent
100% browser-side, no auth, no rate limits, no cost.
Template 5 — gemma-2-9b-it / gemma-2-27b-it
Title: Gemma's SWA architecture in TAF Agent — interesting Δγ signature
Body:
Built a browser diagnostic for transformer LLMs. Gemma family is
interesting because of the alternating SWA pattern.
Per the underlying framework (Marin 2026, "Predicting How Transformers Attend"),
SWA gives a distinctive Δγ ≈ +0.5 signature visible in attention
fingerprinting.
For this specific model:
- Architecture detected: [class]
- Verdict at [your T]: [verdict]
- KV compression recommendation: [strategy]
Tool: https://huggingface.co/spaces/karlexmarin/taf-agent
Can be useful before deployment to predict context-length behavior.
Template 6 — SmolLM2-1.7B / Llama-3.2-1B (small models)
Title: TAF Agent works on small models too — good for edge inference planning
Body:
Built a free browser diagnostic for transformer LLMs. Just ran it on
this small model.
For edge / mobile / browser inference, the relevant questions are
different (latency-sensitive, memory-constrained). TAF Agent's hardware
recipe (X-5) gives concrete tok/s + $/Mtok numbers across consumer GPUs
and Apple Silicon.
For this model: [verdict on edge feasibility]
Tool: https://huggingface.co/spaces/karlexmarin/taf-agent
(Bonus: the tool ITSELF runs in browser via WebLLM with a small model.
So if you want to see how a 1B Instruct model handles tool-use synthesis,
it's the synthesis LLM by default.)
Template 7 — DeepSeek-V3 / DeepSeek-V2-Lite
Title: DeepSeek architecture analyzed in TAF Agent
Body:
DeepSeek's MLA (Multi-head Latent Attention) is interesting — TAF Agent
classifies it under the GQA-like family for first-order analysis,
though MLA itself isn't natively in the framework yet.
Ran X-2 on this model: [verdict]
Ran X-1 (custom vs API): [verdict given DeepSeek's pricing]
URL: https://huggingface.co/spaces/karlexmarin/taf-agent
DeepSeek's API pricing makes interesting math for cost recipes — the
break-even calculations show very different results vs frontier US APIs.
Source: https://github.com/karlesmarin/tafagent
Tips para postear sin parecer spam
- Personaliza — cada post menciona algo específico del modelo
- Aporta valor — no solo "look at my tool", sino observación concreta del análisis
- Pide feedback genuino — preguntas, falsificaciones, confirmaciones
- Espacia los posts — no postees los 8 en 10 minutos. Uno cada 2-3h
- Responde si comentan — engagement real, no fire-and-forget
- No prometas lo que no es — no es benchmark, no es leaderboard
- Reconoce los limites del tool — humildad
En qué ORDEN recomiendo postear
Día 1:
- HF Posts announcement (template separado)
- 1-2 model discussions (empezar con SmolLM2 o phi-3 — comunidad menos competitiva)
Día 2-3:
- 2-3 más (Llama-3-8B, Mistral, Qwen)
Semana 1+:
- Engage con comentarios
- Submit ANALYSIS results del registry como proof
- Ir respondiendo dudas
Si alguien refuta la predicción
¡Genial! Eso es exactamente lo que queremos para validar el framework.
Respuesta tipo:
"Thanks for the falsification — please open an issue in the registry with your setup details so it's permanently logged. The framework is designed to be falsifiable; refutations help us bound validity zones better."
Link: https://github.com/karlesmarin/tafagent-registry/issues/new?template=refutation.md