Spaces:
Running
v0.8.2 JSON CoT-aware Linter — anti-bullshit pack #8
Browse filesConstrained-decoding engines (llguidance, Outlines, SGLang grammars)
emit JSON properties in the order your schema declares them. If a
schema places `answer` before `reasoning`, the model commits to the
final answer first and the rationale that follows can only justify
what was already committed — defeating Chain-of-Thought entirely.
📋 JSON CoT Linter (16th mode):
- Paste any JSON Schema or example response object
- Linter classifies each field as reasoning / answer / other via
name patterns (reason|think|thought|cot|chain.of.thought|analysis|
explanation|rationale|… vs answer|result|verdict|final_answer|…)
- Verdict codes: good_order / anti_pattern / missing_reasoning /
missing_answer / no_cot_fields / invalid_json / non_object / empty
- Suggested-fix block emits a reordered schema (reasoning → other →
answer) with `required[]` mirrored to match — copy back into prompt
Pure logic in `js/json_cot_linter.js` (codes + params, no human
strings); main.js renders with i18n. 39 i18n keys × 4 langs (EN/ES/FR/
ZH) = 156 keys, parity clean. Solutions Hub `structured_outputs` pain
upgraded from `null` → `📋 JSON CoT-aware Linter` (planned: → covered).
Help modal v0.8.2 entry + Inventory anti-bullshit-pack list updated +
task tile "⚙️ Set up an eval correctly" gains the new mode button.
Source citations:
- https://collinwilkins.com/articles/structured-output (the bug)
- https://github.com/guidance-ai/jsonschemabench (10K real schemas)
- https://github.com/guidance-ai/llguidance (constrained decoder)
Verified: 10/10 lint cases + reorder roundtrip + headless e2e (tab
present, section toggles, bad/good examples render verdict + fields +
suggested fix, manual paste detects anti-pattern, invalid JSON shows
error). 17 mode tabs total.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- data/solutions_hub.json +1 -1
- index.html +28 -0
- js/i18n.js +168 -0
- js/json_cot_linter.js +203 -0
- js/main.js +172 -1
|
@@ -183,7 +183,7 @@
|
|
| 183 |
"id": "structured_outputs",
|
| 184 |
"category": "setup",
|
| 185 |
"pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
|
| 186 |
-
"tafagent_mode":
|
| 187 |
"external_tools": [
|
| 188 |
{"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
|
| 189 |
{"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},
|
|
|
|
| 183 |
"id": "structured_outputs",
|
| 184 |
"category": "setup",
|
| 185 |
"pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
|
| 186 |
+
"tafagent_mode": "📋 JSON CoT-aware Linter",
|
| 187 |
"external_tools": [
|
| 188 |
{"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
|
| 189 |
{"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},
|
|
@@ -216,6 +216,9 @@
|
|
| 216 |
<p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
|
| 217 |
<p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
|
| 218 |
|
|
|
|
|
|
|
|
|
|
| 219 |
<p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
|
| 220 |
<p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
|
| 221 |
|
|
@@ -328,6 +331,7 @@
|
|
| 328 |
<li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
|
| 329 |
<li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
|
| 330 |
<li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
|
|
|
|
| 331 |
<li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
|
| 332 |
</ul>
|
| 333 |
</details>
|
|
@@ -399,6 +403,7 @@
|
|
| 399 |
<div class="tile-modes">
|
| 400 |
<button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
|
| 401 |
<button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
|
|
|
|
| 402 |
</div>
|
| 403 |
</div>
|
| 404 |
<div class="task-tile">
|
|
@@ -455,6 +460,7 @@
|
|
| 455 |
<button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
|
| 456 |
<button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
|
| 457 |
<button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
|
|
|
|
| 458 |
<button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
|
| 459 |
</div>
|
| 460 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
|
@@ -1004,6 +1010,28 @@
|
|
| 1004 |
</section>
|
| 1005 |
|
| 1006 |
<!-- Solutions Hub — integrator portal (v0.8.1) -->
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1007 |
<section id="hub-section" style="display:none;">
|
| 1008 |
<h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
|
| 1009 |
<span class="info"><span class="tooltip" data-i18n="hub.tip">
|
|
|
|
| 216 |
<p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
|
| 217 |
<p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
|
| 218 |
|
| 219 |
+
<p><strong data-i18n="help.v082.cot.title">📋 JSON CoT-aware Linter</strong></p>
|
| 220 |
+
<p data-i18n="help.v082.cot.body">Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.</p>
|
| 221 |
+
|
| 222 |
<p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
|
| 223 |
<p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
|
| 224 |
|
|
|
|
| 331 |
<li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
|
| 332 |
<li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
|
| 333 |
<li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
|
| 334 |
+
<li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
|
| 335 |
<li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
|
| 336 |
</ul>
|
| 337 |
</details>
|
|
|
|
| 403 |
<div class="tile-modes">
|
| 404 |
<button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
|
| 405 |
<button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
|
| 406 |
+
<button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
|
| 407 |
</div>
|
| 408 |
</div>
|
| 409 |
<div class="task-tile">
|
|
|
|
| 460 |
<button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
|
| 461 |
<button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
|
| 462 |
<button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
|
| 463 |
+
<button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
|
| 464 |
<button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
|
| 465 |
</div>
|
| 466 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
|
|
|
| 1010 |
</section>
|
| 1011 |
|
| 1012 |
<!-- Solutions Hub — integrator portal (v0.8.1) -->
|
| 1013 |
+
<!-- JSON CoT-aware Linter (mode=cot, v0.8.2 anti-bullshit pack #8) -->
|
| 1014 |
+
<section id="cot-section" style="display:none;">
|
| 1015 |
+
<h2><span data-i18n="cot.title">📋 JSON CoT-aware Linter</span>
|
| 1016 |
+
<span class="info"><span class="tooltip" data-i18n="cot.tip">
|
| 1017 |
+
<strong>Why this matters</strong>: constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places <code>answer</code> before <code>reasoning</code>, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.
|
| 1018 |
+
</span></span>
|
| 1019 |
+
</h2>
|
| 1020 |
+
<p class="recipe-desc" data-i18n="cot.desc">
|
| 1021 |
+
<strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.
|
| 1022 |
+
</p>
|
| 1023 |
+
<div class="form-row">
|
| 1024 |
+
<textarea id="cot-input" rows="10" style="width:100%;font-family:monospace;font-size:0.9em;" data-i18n-placeholder="cot.input.placeholder" placeholder='{ "type": "object", "properties": { "answer": {"type": "string"}, "reasoning": {"type": "string"} } }'></textarea>
|
| 1025 |
+
</div>
|
| 1026 |
+
<div class="form-row">
|
| 1027 |
+
<button type="button" id="cot-lint-btn" data-i18n="cot.lint_btn">🔍 Lint</button>
|
| 1028 |
+
<button type="button" id="cot-example-good-btn" class="secondary" data-i18n="cot.example_good_btn">↳ Example: good order</button>
|
| 1029 |
+
<button type="button" id="cot-example-bad-btn" class="secondary" data-i18n="cot.example_bad_btn">↳ Example: anti-pattern</button>
|
| 1030 |
+
</div>
|
| 1031 |
+
<p id="cot-status" class="recipe-desc" style="font-size:0.92em;"></p>
|
| 1032 |
+
<div id="cot-output" style="margin-top: 1em;"></div>
|
| 1033 |
+
</section>
|
| 1034 |
+
|
| 1035 |
<section id="hub-section" style="display:none;">
|
| 1036 |
<h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
|
| 1037 |
<span class="info"><span class="tooltip" data-i18n="hub.tip">
|
|
@@ -503,6 +503,48 @@ export const TRANSLATIONS = {
|
|
| 503 |
"help.v08.saturation.title": "📈 Benchmark Saturation Detector",
|
| 504 |
"help.v08.saturation.body": "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
|
| 505 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 506 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
|
| 507 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 508 |
"help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
|
|
@@ -1465,6 +1507,48 @@ export const TRANSLATIONS = {
|
|
| 1465 |
"help.v08.saturation.title": "📈 Detector de saturación de benchmarks",
|
| 1466 |
"help.v08.saturation.body": "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
|
| 1467 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1468 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
|
| 1469 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 1470 |
"help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
|
|
@@ -2291,6 +2375,48 @@ export const TRANSLATIONS = {
|
|
| 2291 |
"help.v08.saturation.title": "📈 Détecteur de saturation des benchmarks",
|
| 2292 |
"help.v08.saturation.body": "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
|
| 2293 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2294 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
|
| 2295 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 2296 |
"help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
|
|
@@ -3117,6 +3243,48 @@ export const TRANSLATIONS = {
|
|
| 3117 |
"help.v08.saturation.title": "📈 Benchmark 饱和度检测器",
|
| 3118 |
"help.v08.saturation.body": "MMLU 已饱和(top 88-94%),AIME 2025 上线几个月就饱和,HumanEval 接近饱和。选任何 benchmark,工具返回 top-3 frontier 分数、spread、平均,以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品(例如 MMLU → MMLU-Pro / GPQA / HLE)。可达时从 DemandSphere AI Frontier Tracker(CC BY-NC 4.0)实时 fetch;不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>:在引用\"92% on MMLU\"或设计 eval 之前,检查 benchmark 是否仍能区分任何东西。",
|
| 3119 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗,还是所有 frontier 都在顶部并列?",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3120 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
|
| 3121 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 3122 |
"help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
|
|
|
|
| 503 |
"help.v08.saturation.title": "📈 Benchmark Saturation Detector",
|
| 504 |
"help.v08.saturation.body": "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
|
| 505 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
|
| 506 |
+
|
| 507 |
+
// v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
|
| 508 |
+
"modes.cot": "📋 JSON CoT",
|
| 509 |
+
"mode_desc.cot": "Lints a JSON Schema (or example response object) for the answer-before-reasoning anti-pattern. Constrained-decoding engines emit fields in property order — if `answer` comes before `reasoning`, CoT is defeated.",
|
| 510 |
+
"cot.title": "📋 JSON CoT-aware Linter",
|
| 511 |
+
"cot.tip": "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places `answer` before `reasoning`, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.",
|
| 512 |
+
"cot.desc": "<strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.",
|
| 513 |
+
"cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
|
| 514 |
+
"cot.lint_btn": "🔍 Lint",
|
| 515 |
+
"cot.example_good_btn": "↳ Example: good order",
|
| 516 |
+
"cot.example_bad_btn": "↳ Example: anti-pattern",
|
| 517 |
+
"cot.status.done": "✅ {verdict}",
|
| 518 |
+
"cot.col.field": "Field",
|
| 519 |
+
"cot.col.type": "Role",
|
| 520 |
+
"cot.field.reasoning": "reasoning",
|
| 521 |
+
"cot.field.answer": "answer",
|
| 522 |
+
"cot.field.other": "—",
|
| 523 |
+
"cot.field_count": "{n} fields",
|
| 524 |
+
"cot.verdict.good_order": "✅ Good order — reasoning before answer",
|
| 525 |
+
"cot.verdict.anti_pattern": "❌ Anti-pattern — answer before reasoning",
|
| 526 |
+
"cot.verdict.missing_reasoning": "⚠ Missing reasoning field",
|
| 527 |
+
"cot.verdict.missing_answer": "ℹ No answer-like field detected",
|
| 528 |
+
"cot.verdict.no_cot_fields": "ℹ No reasoning/answer fields detected",
|
| 529 |
+
"cot.verdict.invalid_json": "❌ Invalid JSON",
|
| 530 |
+
"cot.verdict.non_object": "ℹ Top-level value is not an object",
|
| 531 |
+
"cot.verdict.empty_fields": "ℹ No fields to analyse",
|
| 532 |
+
"cot.explain.good_order": "Constrained decoding will emit the rationale first, so the model can think before committing. Chain-of-Thought stays honest.",
|
| 533 |
+
"cot.explain.anti_pattern": "The model is forced to emit the answer field first; any reasoning that follows can only justify what was already committed. Reorder so reasoning-like fields come before answer-like fields.",
|
| 534 |
+
"cot.explain.missing_reasoning": "An answer field is present but no reasoning field. If you want CoT, add a `reasoning` (or `chain_of_thought`, `analysis`, …) field <em>before</em> the answer.",
|
| 535 |
+
"cot.explain.missing_answer": "A reasoning field is present but no obvious answer field. Make sure the schema actually requires the model to commit a final value.",
|
| 536 |
+
"cot.explain.no_cot_fields": "Object has fields, but none look reasoning- or answer-like by name. The linter is conservative — if the schema is intentional, ignore. Otherwise add explicit reasoning/answer fields.",
|
| 537 |
+
"cot.hint.non_object": "Top-level must be a JSON object (`{ … }`) or a JSON Schema with `properties`.",
|
| 538 |
+
"cot.hint.empty_fields": "No fields detected. Paste a JSON Schema, an example response, or click an example button below the textarea.",
|
| 539 |
+
"cot.suggested_fix.title": "✓ Suggested fix",
|
| 540 |
+
"cot.suggested_fix.desc": "Reordered properties — reasoning fields first, then any context fields, then answer fields. `required[]` (if present) is mirrored to match.",
|
| 541 |
+
"cot.suggested_fix.copy": "📋 Copy",
|
| 542 |
+
"cot.suggested_fix.copied": "✓ Copied",
|
| 543 |
+
"cot.attribution": "Refs:",
|
| 544 |
+
"inv.v082.cot": "<strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.",
|
| 545 |
+
"help.v082.cot.title": "📋 JSON CoT-aware Linter",
|
| 546 |
+
"help.v082.cot.body": "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.",
|
| 547 |
+
|
| 548 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
|
| 549 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 550 |
"help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
|
|
|
|
| 1507 |
"help.v08.saturation.title": "📈 Detector de saturación de benchmarks",
|
| 1508 |
"help.v08.saturation.body": "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
|
| 1509 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
|
| 1510 |
+
|
| 1511 |
+
// v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
|
| 1512 |
+
"modes.cot": "📋 JSON CoT",
|
| 1513 |
+
"mode_desc.cot": "Lintea un JSON Schema (o ejemplo de respuesta) buscando el anti-patrón respuesta-antes-de-razonamiento. Los motores de constrained decoding emiten campos en el orden del schema — si `answer` va antes que `reasoning`, el CoT se rompe.",
|
| 1514 |
+
"cot.title": "📋 Linter JSON con consciencia CoT",
|
| 1515 |
+
"cot.tip": "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden del schema. Si tu schema pone `answer` antes de `reasoning`, el modelo se compromete con la respuesta final primero y solo después escribe el razonamiento para justificarla — rompiendo Chain-of-Thought por completo. Pega un JSON Schema (o objeto de ejemplo) y el linter señala el ordenamiento.",
|
| 1516 |
+
"cot.desc": "<strong>Razonamiento antes que respuesta, siempre.</strong> Pega un JSON Schema o un objeto de respuesta de ejemplo — el linter dice si los campos de razonamiento van antes que los de respuesta y propone una corrección.",
|
| 1517 |
+
"cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
|
| 1518 |
+
"cot.lint_btn": "🔍 Lintear",
|
| 1519 |
+
"cot.example_good_btn": "↳ Ejemplo: orden correcto",
|
| 1520 |
+
"cot.example_bad_btn": "↳ Ejemplo: anti-patrón",
|
| 1521 |
+
"cot.status.done": "✅ {verdict}",
|
| 1522 |
+
"cot.col.field": "Campo",
|
| 1523 |
+
"cot.col.type": "Rol",
|
| 1524 |
+
"cot.field.reasoning": "razonamiento",
|
| 1525 |
+
"cot.field.answer": "respuesta",
|
| 1526 |
+
"cot.field.other": "—",
|
| 1527 |
+
"cot.field_count": "{n} campos",
|
| 1528 |
+
"cot.verdict.good_order": "✅ Orden correcto — razonamiento antes que respuesta",
|
| 1529 |
+
"cot.verdict.anti_pattern": "❌ Anti-patrón — respuesta antes que razonamiento",
|
| 1530 |
+
"cot.verdict.missing_reasoning": "⚠ Falta campo de razonamiento",
|
| 1531 |
+
"cot.verdict.missing_answer": "ℹ No se detecta campo tipo respuesta",
|
| 1532 |
+
"cot.verdict.no_cot_fields": "ℹ Sin campos de razonamiento/respuesta detectados",
|
| 1533 |
+
"cot.verdict.invalid_json": "❌ JSON inválido",
|
| 1534 |
+
"cot.verdict.non_object": "ℹ El valor superior no es un objeto",
|
| 1535 |
+
"cot.verdict.empty_fields": "ℹ Sin campos para analizar",
|
| 1536 |
+
"cot.explain.good_order": "El constrained decoding emitirá el razonamiento primero, así el modelo puede pensar antes de comprometerse. Chain-of-Thought se mantiene honesto.",
|
| 1537 |
+
"cot.explain.anti_pattern": "El modelo se ve forzado a emitir el campo de respuesta primero; cualquier razonamiento posterior solo justifica lo ya comprometido. Reordena para que los campos tipo razonamiento vayan antes que los tipo respuesta.",
|
| 1538 |
+
"cot.explain.missing_reasoning": "Hay un campo de respuesta pero ningún campo de razonamiento. Si quieres CoT, añade un campo `reasoning` (o `chain_of_thought`, `analysis`, …) <em>antes</em> de la respuesta.",
|
| 1539 |
+
"cot.explain.missing_answer": "Hay un campo de razonamiento pero ningún campo de respuesta evidente. Asegúrate de que el schema realmente exija al modelo comprometer un valor final.",
|
| 1540 |
+
"cot.explain.no_cot_fields": "El objeto tiene campos pero ninguno se ve como razonamiento o respuesta por su nombre. El linter es conservador — si el schema es intencional, ignóralo. Si no, añade campos explícitos de razonamiento/respuesta.",
|
| 1541 |
+
"cot.hint.non_object": "El valor de nivel superior debe ser un objeto JSON (`{ … }`) o un JSON Schema con `properties`.",
|
| 1542 |
+
"cot.hint.empty_fields": "Sin campos detectados. Pega un JSON Schema, una respuesta de ejemplo, o pulsa un botón de ejemplo bajo el textarea.",
|
| 1543 |
+
"cot.suggested_fix.title": "✓ Corrección sugerida",
|
| 1544 |
+
"cot.suggested_fix.desc": "Propiedades reordenadas — campos de razonamiento primero, luego cualquier campo de contexto, luego los de respuesta. `required[]` (si existe) se reordena igual.",
|
| 1545 |
+
"cot.suggested_fix.copy": "📋 Copiar",
|
| 1546 |
+
"cot.suggested_fix.copied": "✓ Copiado",
|
| 1547 |
+
"cot.attribution": "Referencias:",
|
| 1548 |
+
"inv.v082.cot": "<strong>📋 JSON CoT</strong> — lintea schemas de structured outputs buscando el anti-patrón respuesta-antes-de-razonamiento que silenciosamente rompe Chain-of-Thought.",
|
| 1549 |
+
"help.v082.cot.title": "📋 Linter JSON con consciencia CoT",
|
| 1550 |
+
"help.v082.cot.body": "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden que declara tu schema. Si escribes <code>{ answer, reasoning }</code> el modelo se compromete con <code>answer</code> primero y el CoT se reduce a justificación post-hoc. Pega cualquier schema (o respuesta de ejemplo) — el linter clasifica cada campo como <em>razonamiento</em>, <em>respuesta</em> u <em>otro</em>, señala el ordenamiento, y emite una corrección reordenada para copiar de vuelta. <em>Caso de uso</em>: 'Mi prompt CoT funciona en texto pero degrada en modo JSON' → ejecuta linter, encuentra el orden invertido, corrige.",
|
| 1551 |
+
|
| 1552 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
|
| 1553 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 1554 |
"help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
|
|
|
|
| 2375 |
"help.v08.saturation.title": "📈 Détecteur de saturation des benchmarks",
|
| 2376 |
"help.v08.saturation.body": "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
|
| 2377 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
|
| 2378 |
+
|
| 2379 |
+
// v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
|
| 2380 |
+
"modes.cot": "📋 JSON CoT",
|
| 2381 |
+
"mode_desc.cot": "Linte un JSON Schema (ou un objet de réponse exemple) à la recherche de l'anti-pattern réponse-avant-raisonnement. Les moteurs de décodage contraint émettent les champs dans l'ordre du schema — si `answer` précède `reasoning`, la chaîne de pensée est cassée.",
|
| 2382 |
+
"cot.title": "📋 Linter JSON conscient de CoT",
|
| 2383 |
+
"cot.tip": "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre du schema. Si votre schema place `answer` avant `reasoning`, le modèle s'engage sur la réponse finale en premier et n'écrit la justification qu'ensuite — détruisant la Chaîne de Pensée. Collez un JSON Schema (ou un objet exemple) et le linter signale l'ordre.",
|
| 2384 |
+
"cot.desc": "<strong>Le raisonnement avant la réponse, toujours.</strong> Collez un JSON Schema ou un objet de réponse exemple — le linter rapporte si les champs de raisonnement viennent avant ceux de réponse et propose une correction.",
|
| 2385 |
+
"cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
|
| 2386 |
+
"cot.lint_btn": "🔍 Linter",
|
| 2387 |
+
"cot.example_good_btn": "↳ Exemple : ordre correct",
|
| 2388 |
+
"cot.example_bad_btn": "↳ Exemple : anti-pattern",
|
| 2389 |
+
"cot.status.done": "✅ {verdict}",
|
| 2390 |
+
"cot.col.field": "Champ",
|
| 2391 |
+
"cot.col.type": "Rôle",
|
| 2392 |
+
"cot.field.reasoning": "raisonnement",
|
| 2393 |
+
"cot.field.answer": "réponse",
|
| 2394 |
+
"cot.field.other": "—",
|
| 2395 |
+
"cot.field_count": "{n} champs",
|
| 2396 |
+
"cot.verdict.good_order": "✅ Bon ordre — raisonnement avant réponse",
|
| 2397 |
+
"cot.verdict.anti_pattern": "❌ Anti-pattern — réponse avant raisonnement",
|
| 2398 |
+
"cot.verdict.missing_reasoning": "⚠ Champ de raisonnement manquant",
|
| 2399 |
+
"cot.verdict.missing_answer": "ℹ Aucun champ type réponse détecté",
|
| 2400 |
+
"cot.verdict.no_cot_fields": "ℹ Aucun champ raisonnement/réponse détecté",
|
| 2401 |
+
"cot.verdict.invalid_json": "❌ JSON invalide",
|
| 2402 |
+
"cot.verdict.non_object": "ℹ La valeur de premier niveau n'est pas un objet",
|
| 2403 |
+
"cot.verdict.empty_fields": "ℹ Aucun champ à analyser",
|
| 2404 |
+
"cot.explain.good_order": "Le décodage contraint émettra le raisonnement en premier, le modèle peut donc réfléchir avant de s'engager. La Chaîne de Pensée reste honnête.",
|
| 2405 |
+
"cot.explain.anti_pattern": "Le modèle est forcé d'émettre le champ de réponse en premier ; tout raisonnement qui suit ne fait que justifier ce qui est déjà engagé. Réordonnez pour que les champs raisonnement viennent avant les champs réponse.",
|
| 2406 |
+
"cot.explain.missing_reasoning": "Un champ de réponse est présent mais aucun champ de raisonnement. Si vous voulez du CoT, ajoutez un champ `reasoning` (ou `chain_of_thought`, `analysis`, …) <em>avant</em> la réponse.",
|
| 2407 |
+
"cot.explain.missing_answer": "Un champ de raisonnement est présent mais aucun champ de réponse évident. Vérifiez que le schema force réellement le modèle à s'engager sur une valeur finale.",
|
| 2408 |
+
"cot.explain.no_cot_fields": "L'objet a des champs, mais aucun ne ressemble à du raisonnement ou de la réponse par son nom. Le linter est conservateur — si le schema est intentionnel, ignorez. Sinon ajoutez des champs explicites raisonnement/réponse.",
|
| 2409 |
+
"cot.hint.non_object": "La valeur de premier niveau doit être un objet JSON (`{ … }`) ou un JSON Schema avec `properties`.",
|
| 2410 |
+
"cot.hint.empty_fields": "Aucun champ détecté. Collez un JSON Schema, une réponse exemple, ou cliquez un bouton d'exemple sous le textarea.",
|
| 2411 |
+
"cot.suggested_fix.title": "✓ Correction suggérée",
|
| 2412 |
+
"cot.suggested_fix.desc": "Propriétés réordonnées — champs raisonnement d'abord, puis tout champ de contexte, puis les champs réponse. `required[]` (s'il existe) est réordonné en correspondance.",
|
| 2413 |
+
"cot.suggested_fix.copy": "📋 Copier",
|
| 2414 |
+
"cot.suggested_fix.copied": "✓ Copié",
|
| 2415 |
+
"cot.attribution": "Réfs :",
|
| 2416 |
+
"inv.v082.cot": "<strong>📋 JSON CoT</strong> — linte les schemas de structured outputs à la recherche de l'anti-pattern réponse-avant-raisonnement qui casse silencieusement la Chaîne de Pensée.",
|
| 2417 |
+
"help.v082.cot.title": "📋 Linter JSON conscient de CoT",
|
| 2418 |
+
"help.v082.cot.body": "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre que votre schema déclare. Si vous écrivez <code>{ answer, reasoning }</code> le modèle s'engage sur <code>answer</code> en premier et le CoT se réduit à une justification a posteriori. Collez n'importe quel schema (ou réponse exemple) — le linter classe chaque champ comme <em>raisonnement</em>, <em>réponse</em> ou <em>autre</em>, signale l'ordre, et émet une correction réordonnée à copier. <em>Cas d'usage</em> : 'Mon prompt CoT marche en texte brut mais dégrade en mode JSON' → lancez le linter, trouvez l'ordre inversé, corrigez.",
|
| 2419 |
+
|
| 2420 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
|
| 2421 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 2422 |
"help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
|
|
|
|
| 3243 |
"help.v08.saturation.title": "📈 Benchmark 饱和度检测器",
|
| 3244 |
"help.v08.saturation.body": "MMLU 已饱和(top 88-94%),AIME 2025 上线几个月就饱和,HumanEval 接近饱和。选任何 benchmark,工具返回 top-3 frontier 分数、spread、平均,以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品(例如 MMLU → MMLU-Pro / GPQA / HLE)。可达时从 DemandSphere AI Frontier Tracker(CC BY-NC 4.0)实时 fetch;不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>:在引用\"92% on MMLU\"或设计 eval 之前,检查 benchmark 是否仍能区分任何东西。",
|
| 3245 |
"inv.v08.saturation": "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗,还是所有 frontier 都在顶部并列?",
|
| 3246 |
+
|
| 3247 |
+
// v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
|
| 3248 |
+
"modes.cot": "📋 JSON CoT",
|
| 3249 |
+
"mode_desc.cot": "对 JSON Schema(或示例响应对象)进行 linting,查找『答案在推理之前』的反模式。约束解码引擎按 schema 顺序输出字段——如果 `answer` 在 `reasoning` 之前,CoT 就被破坏了。",
|
| 3250 |
+
"cot.title": "📋 JSON CoT 感知 Linter",
|
| 3251 |
+
"cot.tip": "约束解码引擎(llguidance、Outlines、SGLang 语法)按 schema 顺序输出 JSON 属性。如果 schema 把 `answer` 放在 `reasoning` 之前,模型会先承诺最终答案,然后才写理由来证明它——彻底破坏 Chain-of-Thought。粘贴 JSON Schema(或示例对象),linter 会标记顺序问题。",
|
| 3252 |
+
"cot.desc": "<strong>推理永远先于答案。</strong> 粘贴 JSON Schema 或示例响应对象——linter 报告推理字段是否在答案字段之前,并提出修复建议。",
|
| 3253 |
+
"cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
|
| 3254 |
+
"cot.lint_btn": "🔍 Lint",
|
| 3255 |
+
"cot.example_good_btn": "↳ 示例:正确顺序",
|
| 3256 |
+
"cot.example_bad_btn": "↳ 示例:反模式",
|
| 3257 |
+
"cot.status.done": "✅ {verdict}",
|
| 3258 |
+
"cot.col.field": "字段",
|
| 3259 |
+
"cot.col.type": "角色",
|
| 3260 |
+
"cot.field.reasoning": "推理",
|
| 3261 |
+
"cot.field.answer": "答案",
|
| 3262 |
+
"cot.field.other": "—",
|
| 3263 |
+
"cot.field_count": "{n} 个字段",
|
| 3264 |
+
"cot.verdict.good_order": "✅ 顺序正确——推理在答案之前",
|
| 3265 |
+
"cot.verdict.anti_pattern": "❌ 反模式——答案在推理之前",
|
| 3266 |
+
"cot.verdict.missing_reasoning": "⚠ 缺少推理字段",
|
| 3267 |
+
"cot.verdict.missing_answer": "ℹ 未检测到答案类字段",
|
| 3268 |
+
"cot.verdict.no_cot_fields": "ℹ 未检测到推理/答案字段",
|
| 3269 |
+
"cot.verdict.invalid_json": "❌ JSON 无效",
|
| 3270 |
+
"cot.verdict.non_object": "ℹ 顶层值不是对象",
|
| 3271 |
+
"cot.verdict.empty_fields": "ℹ 没有可分析的字段",
|
| 3272 |
+
"cot.explain.good_order": "约束解码会先输出推理,所以模型可以在承诺之前思考。Chain-of-Thought 保持诚实。",
|
| 3273 |
+
"cot.explain.anti_pattern": "模型被迫先输出答案字段;之后的任何推理只能为已承诺的内容辩护。重新排序,使推理类字段在答案类字段之前。",
|
| 3274 |
+
"cot.explain.missing_reasoning": "存在答案字段但没有推理字段。如果你想要 CoT,在答案<em>之前</em>添加 `reasoning`(或 `chain_of_thought`、`analysis`…)字段。",
|
| 3275 |
+
"cot.explain.missing_answer": "存在推理字段但没有明显的答案字段。确保 schema 实际上要求模型承诺一个最终值。",
|
| 3276 |
+
"cot.explain.no_cot_fields": "对象有字段但都不像推理或答案(按名称)。Linter 保守——如果 schema 是有意的,可以忽略。否则添加显式的推理/答案字段。",
|
| 3277 |
+
"cot.hint.non_object": "顶层值必须是 JSON 对象(`{ … }`)或带 `properties` 的 JSON Schema。",
|
| 3278 |
+
"cot.hint.empty_fields": "未检测到字段。粘贴 JSON Schema、示例响应,或点击 textarea 下方的示例按钮。",
|
| 3279 |
+
"cot.suggested_fix.title": "✓ 建议修复",
|
| 3280 |
+
"cot.suggested_fix.desc": "属性已重新排序——推理字段优先,然后是任何上下文字段,最后是答案字段。`required[]`(如果存在)也镜像同步。",
|
| 3281 |
+
"cot.suggested_fix.copy": "📋 复制",
|
| 3282 |
+
"cot.suggested_fix.copied": "✓ 已复制",
|
| 3283 |
+
"cot.attribution": "参考:",
|
| 3284 |
+
"inv.v082.cot": "<strong>📋 JSON CoT</strong> — 对 structured outputs schema 进行 linting,查找悄悄破坏 Chain-of-Thought 的『答案在推理之前』反模式。",
|
| 3285 |
+
"help.v082.cot.title": "📋 JSON CoT 感知 Linter",
|
| 3286 |
+
"help.v082.cot.body": "约束解码引擎(llguidance、Outlines、SGLang 语法)按 schema 声明的顺序输出 JSON 属性。如果你写 <code>{ answer, reasoning }</code>,模型先承诺 <code>answer</code>,CoT 就退化为事后辩护。粘贴任意 schema(或示例响应)——linter 把每个字段分类为<em>推理</em>、<em>答案</em>或<em>其他</em>,标记顺序,并输出可复制回去的重排修复。<em>用例</em>:『我的 CoT 提示在纯文本中正常但在 JSON 模式下退化』→ 运行 linter,找到颠倒的顺序,修复。",
|
| 3287 |
+
|
| 3288 |
"inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
|
| 3289 |
"help.v081.hub.title": "🧭 Solutions Hub",
|
| 3290 |
"help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
|
|
@@ -0,0 +1,203 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
// JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
|
| 2 |
+
//
|
| 3 |
+
// Pain (Solutions Hub `structured_outputs`): JSON schema engines fail
|
| 4 |
+
// silently and CoT models commit to the answer before reasoning when
|
| 5 |
+
// the schema places `answer` before `reasoning` — constrained decoding
|
| 6 |
+
// emits keys in property order, so the model has to commit a final
|
| 7 |
+
// answer first and only then writes the rationale to justify it,
|
| 8 |
+
// defeating Chain-of-Thought entirely.
|
| 9 |
+
//
|
| 10 |
+
// Source citations:
|
| 11 |
+
// - https://collinwilkins.com/articles/structured-output (field
|
| 12 |
+
// ordering anti-pattern explained)
|
| 13 |
+
// - JSONSchemaBench (10K real schemas) — most are not CoT-aware
|
| 14 |
+
// - llguidance / Outlines / SGLang grammars — all respect property order
|
| 15 |
+
//
|
| 16 |
+
// Pure logic — no human strings. Returns codes+params; main.js does
|
| 17 |
+
// the i18n lookup.
|
| 18 |
+
|
| 19 |
+
// Heuristic field classifiers. Tested against real schemas + examples
|
| 20 |
+
// in the smoke harness; conservative on `other` to avoid mislabeling
|
| 21 |
+
// ambiguous fields (e.g. a `score` could be either reasoning-side or
|
| 22 |
+
// answer-side, but lexically it patterns as answer-side and the
|
| 23 |
+
// false-anti-pattern cost is only "review the schema", which is fine).
|
| 24 |
+
const REASONING_PATTERNS = [
|
| 25 |
+
/reason/i,
|
| 26 |
+
/think/i,
|
| 27 |
+
/thought/i,
|
| 28 |
+
/\bcot\b/i,
|
| 29 |
+
/chain.of.thought/i,
|
| 30 |
+
/analysis/i,
|
| 31 |
+
/\bexplanation\b/i,
|
| 32 |
+
/rationale/i,
|
| 33 |
+
/step.by.step/i,
|
| 34 |
+
/scratchpad/i,
|
| 35 |
+
/justif/i,
|
| 36 |
+
/deliberat/i,
|
| 37 |
+
/\bplan\b/i,
|
| 38 |
+
/\bwhy\b/i,
|
| 39 |
+
];
|
| 40 |
+
|
| 41 |
+
const ANSWER_PATTERNS = [
|
| 42 |
+
/^answer$/i,
|
| 43 |
+
/^result$/i,
|
| 44 |
+
/^output$/i,
|
| 45 |
+
/^response$/i,
|
| 46 |
+
/^final/i,
|
| 47 |
+
/^verdict$/i,
|
| 48 |
+
/^decision$/i,
|
| 49 |
+
/^prediction$/i,
|
| 50 |
+
/^conclusion$/i,
|
| 51 |
+
/^value$/i,
|
| 52 |
+
/^score$/i,
|
| 53 |
+
/^classif/i,
|
| 54 |
+
/^label$/i,
|
| 55 |
+
/^choice$/i,
|
| 56 |
+
/^selected/i,
|
| 57 |
+
];
|
| 58 |
+
|
| 59 |
+
export function classifyFieldName(name) {
|
| 60 |
+
if (typeof name !== "string" || !name) return "other";
|
| 61 |
+
for (const pat of REASONING_PATTERNS) {
|
| 62 |
+
if (pat.test(name)) return "reasoning";
|
| 63 |
+
}
|
| 64 |
+
for (const pat of ANSWER_PATTERNS) {
|
| 65 |
+
if (pat.test(name)) return "answer";
|
| 66 |
+
}
|
| 67 |
+
return "other";
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
// Decide whether `parsed` is a JSON Schema (has `properties` / `$schema`
|
| 71 |
+
// / `type: object`) or a plain example object. Both have ordered keys
|
| 72 |
+
// in modern JS (ES2015+ insertion-order preservation for non-integer
|
| 73 |
+
// string keys), and constrained decoders honor that order, so the
|
| 74 |
+
// detection works on either form.
|
| 75 |
+
function extractFieldOrder(parsed) {
|
| 76 |
+
if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) {
|
| 77 |
+
return { kind: "non_object", fields: [] };
|
| 78 |
+
}
|
| 79 |
+
// Schema form
|
| 80 |
+
if (parsed.properties && typeof parsed.properties === "object") {
|
| 81 |
+
return { kind: "schema", fields: Object.keys(parsed.properties) };
|
| 82 |
+
}
|
| 83 |
+
// Example object form
|
| 84 |
+
return { kind: "example", fields: Object.keys(parsed) };
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
function buildFieldAnnotations(fields) {
|
| 88 |
+
return fields.map((name, idx) => ({
|
| 89 |
+
name,
|
| 90 |
+
idx,
|
| 91 |
+
type: classifyFieldName(name),
|
| 92 |
+
}));
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
function suggestReorder(annotations) {
|
| 96 |
+
// Strategy: keep relative order within each type bucket, but emit
|
| 97 |
+
// reasoning fields first, then `other`, then answer fields. That
|
| 98 |
+
// way CoT runs first, the model can reference any context fields,
|
| 99 |
+
// and the answer comes last (constrained decoding commits the
|
| 100 |
+
// answer after the rationale).
|
| 101 |
+
const reasoning = annotations.filter(a => a.type === "reasoning").map(a => a.name);
|
| 102 |
+
const other = annotations.filter(a => a.type === "other").map(a => a.name);
|
| 103 |
+
const answer = annotations.filter(a => a.type === "answer").map(a => a.name);
|
| 104 |
+
return [...reasoning, ...other, ...answer];
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
// Public entry point. `text` is the user-pasted JSON Schema or example.
|
| 108 |
+
// Returns { code, params } where `code` is one of:
|
| 109 |
+
// - invalid_json
|
| 110 |
+
// - non_object
|
| 111 |
+
// - empty_fields
|
| 112 |
+
// - good_order (reasoning before answer — CoT honored)
|
| 113 |
+
// - anti_pattern (answer before reasoning — model commits early)
|
| 114 |
+
// - missing_reasoning (answer-like fields present, no reasoning)
|
| 115 |
+
// - missing_answer (reasoning fields present, no answer-like field)
|
| 116 |
+
// - no_cot_fields (object has fields but none look reasoning/answer)
|
| 117 |
+
export function lintJsonCot(text) {
|
| 118 |
+
if (typeof text !== "string" || !text.trim()) {
|
| 119 |
+
return { code: "empty_fields", params: { reason: "empty_input" } };
|
| 120 |
+
}
|
| 121 |
+
let parsed;
|
| 122 |
+
try {
|
| 123 |
+
parsed = JSON.parse(text);
|
| 124 |
+
} catch (e) {
|
| 125 |
+
return {
|
| 126 |
+
code: "invalid_json",
|
| 127 |
+
params: { error: String(e && e.message || e).slice(0, 200) },
|
| 128 |
+
};
|
| 129 |
+
}
|
| 130 |
+
const { kind, fields } = extractFieldOrder(parsed);
|
| 131 |
+
if (kind === "non_object") {
|
| 132 |
+
return { code: "non_object", params: { kind: Array.isArray(parsed) ? "array" : typeof parsed } };
|
| 133 |
+
}
|
| 134 |
+
if (fields.length === 0) {
|
| 135 |
+
return { code: "empty_fields", params: { kind } };
|
| 136 |
+
}
|
| 137 |
+
|
| 138 |
+
const annotations = buildFieldAnnotations(fields);
|
| 139 |
+
const reasoningIdx = annotations.findIndex(a => a.type === "reasoning");
|
| 140 |
+
const answerIdx = annotations.findIndex(a => a.type === "answer");
|
| 141 |
+
const hasReasoning = reasoningIdx !== -1;
|
| 142 |
+
const hasAnswer = answerIdx !== -1;
|
| 143 |
+
|
| 144 |
+
const baseParams = {
|
| 145 |
+
kind,
|
| 146 |
+
fields: annotations,
|
| 147 |
+
field_count: annotations.length,
|
| 148 |
+
reasoning_idx: hasReasoning ? reasoningIdx : null,
|
| 149 |
+
answer_idx: hasAnswer ? answerIdx : null,
|
| 150 |
+
suggested_order: suggestReorder(annotations),
|
| 151 |
+
};
|
| 152 |
+
|
| 153 |
+
if (!hasReasoning && !hasAnswer) {
|
| 154 |
+
return { code: "no_cot_fields", params: baseParams };
|
| 155 |
+
}
|
| 156 |
+
if (hasReasoning && !hasAnswer) {
|
| 157 |
+
return { code: "missing_answer", params: baseParams };
|
| 158 |
+
}
|
| 159 |
+
if (!hasReasoning && hasAnswer) {
|
| 160 |
+
return { code: "missing_reasoning", params: baseParams };
|
| 161 |
+
}
|
| 162 |
+
// Both present — order is decisive.
|
| 163 |
+
if (reasoningIdx < answerIdx) {
|
| 164 |
+
return { code: "good_order", params: baseParams };
|
| 165 |
+
}
|
| 166 |
+
return { code: "anti_pattern", params: baseParams };
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
// Build a properties-reordered JSON string preserving the original
|
| 170 |
+
// shape (schema vs example). Used by the UI to show "suggested fix".
|
| 171 |
+
export function reorderJsonText(text, suggestedOrder) {
|
| 172 |
+
let parsed;
|
| 173 |
+
try { parsed = JSON.parse(text); }
|
| 174 |
+
catch { return null; }
|
| 175 |
+
if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) return null;
|
| 176 |
+
|
| 177 |
+
// Reorder properties within a plain object preserving values.
|
| 178 |
+
const reorderObj = (obj, order) => {
|
| 179 |
+
const out = {};
|
| 180 |
+
// First emit suggested keys that exist on the object.
|
| 181 |
+
for (const k of order) {
|
| 182 |
+
if (Object.prototype.hasOwnProperty.call(obj, k)) out[k] = obj[k];
|
| 183 |
+
}
|
| 184 |
+
// Then any keys not in the suggested order (defensive: keeps unknowns).
|
| 185 |
+
for (const k of Object.keys(obj)) {
|
| 186 |
+
if (!Object.prototype.hasOwnProperty.call(out, k)) out[k] = obj[k];
|
| 187 |
+
}
|
| 188 |
+
return out;
|
| 189 |
+
};
|
| 190 |
+
|
| 191 |
+
if (parsed.properties && typeof parsed.properties === "object") {
|
| 192 |
+
parsed.properties = reorderObj(parsed.properties, suggestedOrder);
|
| 193 |
+
// If `required` array exists, mirror suggested order so generators
|
| 194 |
+
// that emit fields in `required[]` order also benefit. Keep only
|
| 195 |
+
// the keys originally present in `required`.
|
| 196 |
+
if (Array.isArray(parsed.required)) {
|
| 197 |
+
const wasRequired = new Set(parsed.required);
|
| 198 |
+
parsed.required = suggestedOrder.filter(k => wasRequired.has(k));
|
| 199 |
+
}
|
| 200 |
+
return JSON.stringify(parsed, null, 2);
|
| 201 |
+
}
|
| 202 |
+
return JSON.stringify(reorderObj(parsed, suggestedOrder), null, 2);
|
| 203 |
+
}
|
|
@@ -27,6 +27,7 @@ import {
|
|
| 27 |
loadHub, listCategories, listEntries, searchEntries,
|
| 28 |
hubStats, getCategoryMeta,
|
| 29 |
} from "./solutions_hub.js";
|
|
|
|
| 30 |
|
| 31 |
// Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
|
| 32 |
// Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
|
|
@@ -216,6 +217,7 @@ document.addEventListener("click", (e) => {
|
|
| 216 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
| 217 |
quant: "quant-section", drift: "drift-section", niah: "niah-section",
|
| 218 |
saturation: "saturation-section",
|
|
|
|
| 219 |
hub: "hub-section",
|
| 220 |
}[targetMode];
|
| 221 |
if (sectionId) {
|
|
@@ -241,7 +243,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 241 |
"diagnose-section", "phase-section", "unmask-section",
|
| 242 |
"template-section", "arena-section", "contam-section",
|
| 243 |
"quant-section", "drift-section", "niah-section",
|
| 244 |
-
"saturation-section", "hub-section"].forEach(id => {
|
| 245 |
const el = $(id);
|
| 246 |
if (el) el.style.display = "none";
|
| 247 |
});
|
|
@@ -253,6 +255,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 253 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
| 254 |
quant: "quant-section", drift: "drift-section", niah: "niah-section",
|
| 255 |
saturation: "saturation-section",
|
|
|
|
| 256 |
hub: "hub-section",
|
| 257 |
};
|
| 258 |
const sectionId = sectionMap[mode];
|
|
@@ -260,6 +263,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 260 |
$("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
|
| 261 |
if (mode === "phase") initPhaseDiagram();
|
| 262 |
if (mode === "saturation") initSaturation();
|
|
|
|
| 263 |
if (mode === "hub") initHub();
|
| 264 |
});
|
| 265 |
});
|
|
@@ -3384,6 +3388,173 @@ $("hub-clear-btn")?.addEventListener("click", () => {
|
|
| 3384 |
renderHubAll();
|
| 3385 |
});
|
| 3386 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3387 |
// ════════════════════════════════════════════════════════════════════
|
| 3388 |
// Bootstrap
|
| 3389 |
// ════════════════════════════════════════════════════════════════════
|
|
|
|
| 27 |
loadHub, listCategories, listEntries, searchEntries,
|
| 28 |
hubStats, getCategoryMeta,
|
| 29 |
} from "./solutions_hub.js";
|
| 30 |
+
import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
|
| 31 |
|
| 32 |
// Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
|
| 33 |
// Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
|
|
|
|
| 217 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
| 218 |
quant: "quant-section", drift: "drift-section", niah: "niah-section",
|
| 219 |
saturation: "saturation-section",
|
| 220 |
+
cot: "cot-section",
|
| 221 |
hub: "hub-section",
|
| 222 |
}[targetMode];
|
| 223 |
if (sectionId) {
|
|
|
|
| 243 |
"diagnose-section", "phase-section", "unmask-section",
|
| 244 |
"template-section", "arena-section", "contam-section",
|
| 245 |
"quant-section", "drift-section", "niah-section",
|
| 246 |
+
"saturation-section", "cot-section", "hub-section"].forEach(id => {
|
| 247 |
const el = $(id);
|
| 248 |
if (el) el.style.display = "none";
|
| 249 |
});
|
|
|
|
| 255 |
template: "template-section", arena: "arena-section", contam: "contam-section",
|
| 256 |
quant: "quant-section", drift: "drift-section", niah: "niah-section",
|
| 257 |
saturation: "saturation-section",
|
| 258 |
+
cot: "cot-section",
|
| 259 |
hub: "hub-section",
|
| 260 |
};
|
| 261 |
const sectionId = sectionMap[mode];
|
|
|
|
| 263 |
$("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
|
| 264 |
if (mode === "phase") initPhaseDiagram();
|
| 265 |
if (mode === "saturation") initSaturation();
|
| 266 |
+
if (mode === "cot") initCot();
|
| 267 |
if (mode === "hub") initHub();
|
| 268 |
});
|
| 269 |
});
|
|
|
|
| 3388 |
renderHubAll();
|
| 3389 |
});
|
| 3390 |
|
| 3391 |
+
// ════════════════════════════════════════════════════════════════════
|
| 3392 |
+
// 📋 JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
|
| 3393 |
+
// ════════════════════════════════════════════════════════════════════
|
| 3394 |
+
const COT_FIELD_TYPE_BADGE = {
|
| 3395 |
+
reasoning: "🧠",
|
| 3396 |
+
answer: "🎯",
|
| 3397 |
+
other: "·",
|
| 3398 |
+
};
|
| 3399 |
+
|
| 3400 |
+
const COT_VERDICT_BADGE_BG = {
|
| 3401 |
+
good_order: "#3fb950", // green
|
| 3402 |
+
anti_pattern: "#f85149", // red
|
| 3403 |
+
missing_reasoning: "#d29922", // amber
|
| 3404 |
+
missing_answer: "#d29922", // amber
|
| 3405 |
+
no_cot_fields: "#8b949e", // gray
|
| 3406 |
+
non_object: "#8b949e",
|
| 3407 |
+
empty_fields: "#8b949e",
|
| 3408 |
+
invalid_json: "#f85149", // red
|
| 3409 |
+
};
|
| 3410 |
+
|
| 3411 |
+
let __cotInited = false;
|
| 3412 |
+
|
| 3413 |
+
function initCot() {
|
| 3414 |
+
if (__cotInited) return;
|
| 3415 |
+
__cotInited = true;
|
| 3416 |
+
// No-op (no async data); placeholder kept for symmetry with other modes.
|
| 3417 |
+
}
|
| 3418 |
+
|
| 3419 |
+
function renderCotResult(result, originalText) {
|
| 3420 |
+
const verdict = t(`cot.verdict.${result.code}`) || result.code;
|
| 3421 |
+
const verdictBg = COT_VERDICT_BADGE_BG[result.code] || "#8b949e";
|
| 3422 |
+
const verdictBadge = `<span class="badge" style="background:${verdictBg};">${verdict}</span>`;
|
| 3423 |
+
|
| 3424 |
+
// Failure cases short-circuit: just show the verdict + reason.
|
| 3425 |
+
if (result.code === "invalid_json") {
|
| 3426 |
+
const reason = result.params?.error || "";
|
| 3427 |
+
return `<div class="arena-result">
|
| 3428 |
+
<p style="font-size:1.1em;">${verdictBadge}</p>
|
| 3429 |
+
<pre style="background:#21262d;padding:0.75em;border-radius:4px;color:#f0883e;">${escapeHtml(reason)}</pre>
|
| 3430 |
+
</div>`;
|
| 3431 |
+
}
|
| 3432 |
+
if (result.code === "empty_fields" || result.code === "non_object") {
|
| 3433 |
+
return `<div class="arena-result">
|
| 3434 |
+
<p style="font-size:1.1em;">${verdictBadge}</p>
|
| 3435 |
+
<p class="recipe-desc">${t(`cot.hint.${result.code}`) || ""}</p>
|
| 3436 |
+
</div>`;
|
| 3437 |
+
}
|
| 3438 |
+
|
| 3439 |
+
const fields = result.params?.fields || [];
|
| 3440 |
+
const fieldRows = fields.map(f => {
|
| 3441 |
+
const icon = COT_FIELD_TYPE_BADGE[f.type] || "·";
|
| 3442 |
+
const typeLabel = t(`cot.field.${f.type}`) || f.type;
|
| 3443 |
+
const color = f.type === "reasoning" ? "#3fb950"
|
| 3444 |
+
: f.type === "answer" ? "#f0883e"
|
| 3445 |
+
: "#8b949e";
|
| 3446 |
+
return `<tr>
|
| 3447 |
+
<td style="text-align:right;color:#8b949e;">${f.idx}</td>
|
| 3448 |
+
<td><code>${escapeHtml(f.name)}</code></td>
|
| 3449 |
+
<td><span style="color:${color};">${icon} ${typeLabel}</span></td>
|
| 3450 |
+
</tr>`;
|
| 3451 |
+
}).join("");
|
| 3452 |
+
const fieldTable = `
|
| 3453 |
+
<table class="lean-table" style="margin-top:0.5em;">
|
| 3454 |
+
<thead><tr>
|
| 3455 |
+
<th>#</th>
|
| 3456 |
+
<th data-i18n="cot.col.field">Field</th>
|
| 3457 |
+
<th data-i18n="cot.col.type">Type</th>
|
| 3458 |
+
</tr></thead>
|
| 3459 |
+
<tbody>${fieldRows}</tbody>
|
| 3460 |
+
</table>
|
| 3461 |
+
`;
|
| 3462 |
+
|
| 3463 |
+
// Suggested-fix block — only when there's a meaningful reorder.
|
| 3464 |
+
let fixBlock = "";
|
| 3465 |
+
if (result.code === "anti_pattern") {
|
| 3466 |
+
const suggested = result.params?.suggested_order || [];
|
| 3467 |
+
const fixed = reorderJsonText(originalText, suggested);
|
| 3468 |
+
if (fixed) {
|
| 3469 |
+
fixBlock = `
|
| 3470 |
+
<details open style="margin-top:1em;">
|
| 3471 |
+
<summary style="cursor:pointer;color:#3fb950;">
|
| 3472 |
+
<strong>${t("cot.suggested_fix.title") || "✓ Suggested fix"}</strong>
|
| 3473 |
+
</summary>
|
| 3474 |
+
<p class="recipe-desc">${t("cot.suggested_fix.desc") || ""}</p>
|
| 3475 |
+
<pre style="background:#0d1117;padding:0.75em;border-radius:4px;overflow-x:auto;"><code>${escapeHtml(fixed)}</code></pre>
|
| 3476 |
+
<button type="button" class="secondary" onclick="navigator.clipboard.writeText(this.previousElementSibling.textContent).then(()=>{this.textContent='${t("cot.suggested_fix.copied") || "✓ Copied"}';setTimeout(()=>{this.textContent='${t("cot.suggested_fix.copy") || "📋 Copy"}';},1500);})">${t("cot.suggested_fix.copy") || "📋 Copy"}</button>
|
| 3477 |
+
</details>
|
| 3478 |
+
`;
|
| 3479 |
+
}
|
| 3480 |
+
}
|
| 3481 |
+
|
| 3482 |
+
// Verdict explainer
|
| 3483 |
+
const explainer = t(`cot.explain.${result.code}`) || "";
|
| 3484 |
+
const explainerBlock = explainer
|
| 3485 |
+
? `<p class="recipe-desc">${explainer}</p>`
|
| 3486 |
+
: "";
|
| 3487 |
+
|
| 3488 |
+
// Source attribution footer
|
| 3489 |
+
const attribution = `
|
| 3490 |
+
<p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
|
| 3491 |
+
${t("cot.attribution") || ""}
|
| 3492 |
+
<a href="https://collinwilkins.com/articles/structured-output" target="_blank" rel="noopener noreferrer">collinwilkins.com</a> ·
|
| 3493 |
+
<a href="https://github.com/guidance-ai/jsonschemabench" target="_blank" rel="noopener noreferrer">JSONSchemaBench</a> ·
|
| 3494 |
+
<a href="https://github.com/guidance-ai/llguidance" target="_blank" rel="noopener noreferrer">llguidance</a>
|
| 3495 |
+
</p>
|
| 3496 |
+
`;
|
| 3497 |
+
|
| 3498 |
+
return `<div class="arena-result">
|
| 3499 |
+
<p style="font-size:1.1em;">${verdictBadge}
|
| 3500 |
+
<span class="subtle" style="font-size:0.9em;">(${tFmt("cot.field_count", { n: result.params.field_count }) || `${result.params.field_count} fields`})</span>
|
| 3501 |
+
</p>
|
| 3502 |
+
${explainerBlock}
|
| 3503 |
+
${fieldTable}
|
| 3504 |
+
${fixBlock}
|
| 3505 |
+
${attribution}
|
| 3506 |
+
</div>`;
|
| 3507 |
+
}
|
| 3508 |
+
|
| 3509 |
+
function runCotLint() {
|
| 3510 |
+
const text = $("cot-input")?.value || "";
|
| 3511 |
+
const result = lintJsonCot(text);
|
| 3512 |
+
$("cot-output").innerHTML = renderCotResult(result, text);
|
| 3513 |
+
$("cot-status").textContent = tFmt("cot.status.done", {
|
| 3514 |
+
verdict: t(`cot.verdict.${result.code}`) || result.code,
|
| 3515 |
+
});
|
| 3516 |
+
}
|
| 3517 |
+
|
| 3518 |
+
const COT_EXAMPLE_GOOD = JSON.stringify({
|
| 3519 |
+
type: "object",
|
| 3520 |
+
properties: {
|
| 3521 |
+
reasoning: {
|
| 3522 |
+
type: "string",
|
| 3523 |
+
description: "Step-by-step rationale before committing to an answer.",
|
| 3524 |
+
},
|
| 3525 |
+
answer: {
|
| 3526 |
+
type: "string",
|
| 3527 |
+
description: "Final answer, derived from the reasoning above.",
|
| 3528 |
+
},
|
| 3529 |
+
},
|
| 3530 |
+
required: ["reasoning", "answer"],
|
| 3531 |
+
}, null, 2);
|
| 3532 |
+
|
| 3533 |
+
const COT_EXAMPLE_BAD = JSON.stringify({
|
| 3534 |
+
type: "object",
|
| 3535 |
+
properties: {
|
| 3536 |
+
final_answer: {
|
| 3537 |
+
type: "string",
|
| 3538 |
+
description: "The model's final answer.",
|
| 3539 |
+
},
|
| 3540 |
+
chain_of_thought: {
|
| 3541 |
+
type: "string",
|
| 3542 |
+
description: "Justification for the answer above.",
|
| 3543 |
+
},
|
| 3544 |
+
},
|
| 3545 |
+
required: ["final_answer", "chain_of_thought"],
|
| 3546 |
+
}, null, 2);
|
| 3547 |
+
|
| 3548 |
+
$("cot-lint-btn")?.addEventListener("click", runCotLint);
|
| 3549 |
+
$("cot-example-good-btn")?.addEventListener("click", () => {
|
| 3550 |
+
$("cot-input").value = COT_EXAMPLE_GOOD;
|
| 3551 |
+
runCotLint();
|
| 3552 |
+
});
|
| 3553 |
+
$("cot-example-bad-btn")?.addEventListener("click", () => {
|
| 3554 |
+
$("cot-input").value = COT_EXAMPLE_BAD;
|
| 3555 |
+
runCotLint();
|
| 3556 |
+
});
|
| 3557 |
+
|
| 3558 |
// ════════════════════════════════════════════════════════════════════
|
| 3559 |
// Bootstrap
|
| 3560 |
// ════════════════════════════════════════════════════════════════════
|