karlexmarin Claude Opus 4.7 (1M context) commited on
Commit
b0f51d4
·
1 Parent(s): fbf3edc

v0.8.2 JSON CoT-aware Linter — anti-bullshit pack #8

Browse files

Constrained-decoding engines (llguidance, Outlines, SGLang grammars)
emit JSON properties in the order your schema declares them. If a
schema places `answer` before `reasoning`, the model commits to the
final answer first and the rationale that follows can only justify
what was already committed — defeating Chain-of-Thought entirely.

📋 JSON CoT Linter (16th mode):
- Paste any JSON Schema or example response object
- Linter classifies each field as reasoning / answer / other via
name patterns (reason|think|thought|cot|chain.of.thought|analysis|
explanation|rationale|… vs answer|result|verdict|final_answer|…)
- Verdict codes: good_order / anti_pattern / missing_reasoning /
missing_answer / no_cot_fields / invalid_json / non_object / empty
- Suggested-fix block emits a reordered schema (reasoning → other →
answer) with `required[]` mirrored to match — copy back into prompt

Pure logic in `js/json_cot_linter.js` (codes + params, no human
strings); main.js renders with i18n. 39 i18n keys × 4 langs (EN/ES/FR/
ZH) = 156 keys, parity clean. Solutions Hub `structured_outputs` pain
upgraded from `null` → `📋 JSON CoT-aware Linter` (planned: → covered).
Help modal v0.8.2 entry + Inventory anti-bullshit-pack list updated +
task tile "⚙️ Set up an eval correctly" gains the new mode button.

Source citations:
- https://collinwilkins.com/articles/structured-output (the bug)
- https://github.com/guidance-ai/jsonschemabench (10K real schemas)
- https://github.com/guidance-ai/llguidance (constrained decoder)

Verified: 10/10 lint cases + reorder roundtrip + headless e2e (tab
present, section toggles, bad/good examples render verdict + fields +
suggested fix, manual paste detects anti-pattern, invalid JSON shows
error). 17 mode tabs total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show
  1. data/solutions_hub.json +1 -1
  2. index.html +28 -0
  3. js/i18n.js +168 -0
  4. js/json_cot_linter.js +203 -0
  5. js/main.js +172 -1
data/solutions_hub.json CHANGED
@@ -183,7 +183,7 @@
183
  "id": "structured_outputs",
184
  "category": "setup",
185
  "pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
186
- "tafagent_mode": null,
187
  "external_tools": [
188
  {"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
189
  {"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},
 
183
  "id": "structured_outputs",
184
  "category": "setup",
185
  "pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
186
+ "tafagent_mode": "📋 JSON CoT-aware Linter",
187
  "external_tools": [
188
  {"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
189
  {"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},
index.html CHANGED
@@ -216,6 +216,9 @@
216
  <p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
217
  <p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
218
 
 
 
 
219
  <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
220
  <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
221
 
@@ -328,6 +331,7 @@
328
  <li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
329
  <li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
330
  <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
 
331
  <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
332
  </ul>
333
  </details>
@@ -399,6 +403,7 @@
399
  <div class="tile-modes">
400
  <button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
401
  <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
 
402
  </div>
403
  </div>
404
  <div class="task-tile">
@@ -455,6 +460,7 @@
455
  <button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
456
  <button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
457
  <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
 
458
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
459
  </div>
460
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
@@ -1004,6 +1010,28 @@
1004
  </section>
1005
 
1006
  <!-- Solutions Hub — integrator portal (v0.8.1) -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1007
  <section id="hub-section" style="display:none;">
1008
  <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
1009
  <span class="info"><span class="tooltip" data-i18n="hub.tip">
 
216
  <p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
217
  <p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
218
 
219
+ <p><strong data-i18n="help.v082.cot.title">📋 JSON CoT-aware Linter</strong></p>
220
+ <p data-i18n="help.v082.cot.body">Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.</p>
221
+
222
  <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
223
  <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
224
 
 
331
  <li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
332
  <li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
333
  <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
334
+ <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
335
  <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
336
  </ul>
337
  </details>
 
403
  <div class="tile-modes">
404
  <button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
405
  <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
406
+ <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
407
  </div>
408
  </div>
409
  <div class="task-tile">
 
460
  <button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
461
  <button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
462
  <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
463
+ <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
464
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
465
  </div>
466
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
 
1010
  </section>
1011
 
1012
  <!-- Solutions Hub — integrator portal (v0.8.1) -->
1013
+ <!-- JSON CoT-aware Linter (mode=cot, v0.8.2 anti-bullshit pack #8) -->
1014
+ <section id="cot-section" style="display:none;">
1015
+ <h2><span data-i18n="cot.title">📋 JSON CoT-aware Linter</span>
1016
+ <span class="info"><span class="tooltip" data-i18n="cot.tip">
1017
+ <strong>Why this matters</strong>: constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places <code>answer</code> before <code>reasoning</code>, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.
1018
+ </span></span>
1019
+ </h2>
1020
+ <p class="recipe-desc" data-i18n="cot.desc">
1021
+ <strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.
1022
+ </p>
1023
+ <div class="form-row">
1024
+ <textarea id="cot-input" rows="10" style="width:100%;font-family:monospace;font-size:0.9em;" data-i18n-placeholder="cot.input.placeholder" placeholder='{ "type": "object", "properties": { "answer": {"type": "string"}, "reasoning": {"type": "string"} } }'></textarea>
1025
+ </div>
1026
+ <div class="form-row">
1027
+ <button type="button" id="cot-lint-btn" data-i18n="cot.lint_btn">🔍 Lint</button>
1028
+ <button type="button" id="cot-example-good-btn" class="secondary" data-i18n="cot.example_good_btn">↳ Example: good order</button>
1029
+ <button type="button" id="cot-example-bad-btn" class="secondary" data-i18n="cot.example_bad_btn">↳ Example: anti-pattern</button>
1030
+ </div>
1031
+ <p id="cot-status" class="recipe-desc" style="font-size:0.92em;"></p>
1032
+ <div id="cot-output" style="margin-top: 1em;"></div>
1033
+ </section>
1034
+
1035
  <section id="hub-section" style="display:none;">
1036
  <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
1037
  <span class="info"><span class="tooltip" data-i18n="hub.tip">
js/i18n.js CHANGED
@@ -503,6 +503,48 @@ export const TRANSLATIONS = {
503
  "help.v08.saturation.title": "📈 Benchmark Saturation Detector",
504
  "help.v08.saturation.body": "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
505
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
506
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
507
  "help.v081.hub.title": "🧭 Solutions Hub",
508
  "help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
@@ -1465,6 +1507,48 @@ export const TRANSLATIONS = {
1465
  "help.v08.saturation.title": "📈 Detector de saturación de benchmarks",
1466
  "help.v08.saturation.body": "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
1467
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1468
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
1469
  "help.v081.hub.title": "🧭 Solutions Hub",
1470
  "help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
@@ -2291,6 +2375,48 @@ export const TRANSLATIONS = {
2291
  "help.v08.saturation.title": "📈 Détecteur de saturation des benchmarks",
2292
  "help.v08.saturation.body": "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
2293
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2294
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
2295
  "help.v081.hub.title": "🧭 Solutions Hub",
2296
  "help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
@@ -3117,6 +3243,48 @@ export const TRANSLATIONS = {
3117
  "help.v08.saturation.title": "📈 Benchmark 饱和度检测器",
3118
  "help.v08.saturation.body": "MMLU 已饱和(top 88-94%),AIME 2025 上线几个月就饱和,HumanEval 接近饱和。选任何 benchmark,工具返回 top-3 frontier 分数、spread、平均,以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品(例如 MMLU → MMLU-Pro / GPQA / HLE)。可达时从 DemandSphere AI Frontier Tracker(CC BY-NC 4.0)实时 fetch;不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>:在引用\"92% on MMLU\"或设计 eval 之前,检查 benchmark 是否仍能区分任何东西。",
3119
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗,还是所有 frontier 都在顶部并列?",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3120
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
3121
  "help.v081.hub.title": "🧭 Solutions Hub",
3122
  "help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
 
503
  "help.v08.saturation.title": "📈 Benchmark Saturation Detector",
504
  "help.v08.saturation.body": "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
505
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
506
+
507
+ // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
508
+ "modes.cot": "📋 JSON CoT",
509
+ "mode_desc.cot": "Lints a JSON Schema (or example response object) for the answer-before-reasoning anti-pattern. Constrained-decoding engines emit fields in property order — if `answer` comes before `reasoning`, CoT is defeated.",
510
+ "cot.title": "📋 JSON CoT-aware Linter",
511
+ "cot.tip": "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places `answer` before `reasoning`, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.",
512
+ "cot.desc": "<strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.",
513
+ "cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
514
+ "cot.lint_btn": "🔍 Lint",
515
+ "cot.example_good_btn": "↳ Example: good order",
516
+ "cot.example_bad_btn": "↳ Example: anti-pattern",
517
+ "cot.status.done": "✅ {verdict}",
518
+ "cot.col.field": "Field",
519
+ "cot.col.type": "Role",
520
+ "cot.field.reasoning": "reasoning",
521
+ "cot.field.answer": "answer",
522
+ "cot.field.other": "—",
523
+ "cot.field_count": "{n} fields",
524
+ "cot.verdict.good_order": "✅ Good order — reasoning before answer",
525
+ "cot.verdict.anti_pattern": "❌ Anti-pattern — answer before reasoning",
526
+ "cot.verdict.missing_reasoning": "⚠ Missing reasoning field",
527
+ "cot.verdict.missing_answer": "ℹ No answer-like field detected",
528
+ "cot.verdict.no_cot_fields": "ℹ No reasoning/answer fields detected",
529
+ "cot.verdict.invalid_json": "❌ Invalid JSON",
530
+ "cot.verdict.non_object": "ℹ Top-level value is not an object",
531
+ "cot.verdict.empty_fields": "ℹ No fields to analyse",
532
+ "cot.explain.good_order": "Constrained decoding will emit the rationale first, so the model can think before committing. Chain-of-Thought stays honest.",
533
+ "cot.explain.anti_pattern": "The model is forced to emit the answer field first; any reasoning that follows can only justify what was already committed. Reorder so reasoning-like fields come before answer-like fields.",
534
+ "cot.explain.missing_reasoning": "An answer field is present but no reasoning field. If you want CoT, add a `reasoning` (or `chain_of_thought`, `analysis`, …) field <em>before</em> the answer.",
535
+ "cot.explain.missing_answer": "A reasoning field is present but no obvious answer field. Make sure the schema actually requires the model to commit a final value.",
536
+ "cot.explain.no_cot_fields": "Object has fields, but none look reasoning- or answer-like by name. The linter is conservative — if the schema is intentional, ignore. Otherwise add explicit reasoning/answer fields.",
537
+ "cot.hint.non_object": "Top-level must be a JSON object (`{ … }`) or a JSON Schema with `properties`.",
538
+ "cot.hint.empty_fields": "No fields detected. Paste a JSON Schema, an example response, or click an example button below the textarea.",
539
+ "cot.suggested_fix.title": "✓ Suggested fix",
540
+ "cot.suggested_fix.desc": "Reordered properties — reasoning fields first, then any context fields, then answer fields. `required[]` (if present) is mirrored to match.",
541
+ "cot.suggested_fix.copy": "📋 Copy",
542
+ "cot.suggested_fix.copied": "✓ Copied",
543
+ "cot.attribution": "Refs:",
544
+ "inv.v082.cot": "<strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.",
545
+ "help.v082.cot.title": "📋 JSON CoT-aware Linter",
546
+ "help.v082.cot.body": "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.",
547
+
548
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
549
  "help.v081.hub.title": "🧭 Solutions Hub",
550
  "help.v081.hub.body": "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
 
1507
  "help.v08.saturation.title": "📈 Detector de saturación de benchmarks",
1508
  "help.v08.saturation.body": "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
1509
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
1510
+
1511
+ // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
1512
+ "modes.cot": "📋 JSON CoT",
1513
+ "mode_desc.cot": "Lintea un JSON Schema (o ejemplo de respuesta) buscando el anti-patrón respuesta-antes-de-razonamiento. Los motores de constrained decoding emiten campos en el orden del schema — si `answer` va antes que `reasoning`, el CoT se rompe.",
1514
+ "cot.title": "📋 Linter JSON con consciencia CoT",
1515
+ "cot.tip": "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden del schema. Si tu schema pone `answer` antes de `reasoning`, el modelo se compromete con la respuesta final primero y solo después escribe el razonamiento para justificarla — rompiendo Chain-of-Thought por completo. Pega un JSON Schema (o objeto de ejemplo) y el linter señala el ordenamiento.",
1516
+ "cot.desc": "<strong>Razonamiento antes que respuesta, siempre.</strong> Pega un JSON Schema o un objeto de respuesta de ejemplo — el linter dice si los campos de razonamiento van antes que los de respuesta y propone una corrección.",
1517
+ "cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
1518
+ "cot.lint_btn": "🔍 Lintear",
1519
+ "cot.example_good_btn": "↳ Ejemplo: orden correcto",
1520
+ "cot.example_bad_btn": "↳ Ejemplo: anti-patrón",
1521
+ "cot.status.done": "✅ {verdict}",
1522
+ "cot.col.field": "Campo",
1523
+ "cot.col.type": "Rol",
1524
+ "cot.field.reasoning": "razonamiento",
1525
+ "cot.field.answer": "respuesta",
1526
+ "cot.field.other": "—",
1527
+ "cot.field_count": "{n} campos",
1528
+ "cot.verdict.good_order": "✅ Orden correcto — razonamiento antes que respuesta",
1529
+ "cot.verdict.anti_pattern": "❌ Anti-patrón — respuesta antes que razonamiento",
1530
+ "cot.verdict.missing_reasoning": "⚠ Falta campo de razonamiento",
1531
+ "cot.verdict.missing_answer": "ℹ No se detecta campo tipo respuesta",
1532
+ "cot.verdict.no_cot_fields": "ℹ Sin campos de razonamiento/respuesta detectados",
1533
+ "cot.verdict.invalid_json": "❌ JSON inválido",
1534
+ "cot.verdict.non_object": "ℹ El valor superior no es un objeto",
1535
+ "cot.verdict.empty_fields": "ℹ Sin campos para analizar",
1536
+ "cot.explain.good_order": "El constrained decoding emitirá el razonamiento primero, así el modelo puede pensar antes de comprometerse. Chain-of-Thought se mantiene honesto.",
1537
+ "cot.explain.anti_pattern": "El modelo se ve forzado a emitir el campo de respuesta primero; cualquier razonamiento posterior solo justifica lo ya comprometido. Reordena para que los campos tipo razonamiento vayan antes que los tipo respuesta.",
1538
+ "cot.explain.missing_reasoning": "Hay un campo de respuesta pero ningún campo de razonamiento. Si quieres CoT, añade un campo `reasoning` (o `chain_of_thought`, `analysis`, …) <em>antes</em> de la respuesta.",
1539
+ "cot.explain.missing_answer": "Hay un campo de razonamiento pero ningún campo de respuesta evidente. Asegúrate de que el schema realmente exija al modelo comprometer un valor final.",
1540
+ "cot.explain.no_cot_fields": "El objeto tiene campos pero ninguno se ve como razonamiento o respuesta por su nombre. El linter es conservador — si el schema es intencional, ignóralo. Si no, añade campos explícitos de razonamiento/respuesta.",
1541
+ "cot.hint.non_object": "El valor de nivel superior debe ser un objeto JSON (`{ … }`) o un JSON Schema con `properties`.",
1542
+ "cot.hint.empty_fields": "Sin campos detectados. Pega un JSON Schema, una respuesta de ejemplo, o pulsa un botón de ejemplo bajo el textarea.",
1543
+ "cot.suggested_fix.title": "✓ Corrección sugerida",
1544
+ "cot.suggested_fix.desc": "Propiedades reordenadas — campos de razonamiento primero, luego cualquier campo de contexto, luego los de respuesta. `required[]` (si existe) se reordena igual.",
1545
+ "cot.suggested_fix.copy": "📋 Copiar",
1546
+ "cot.suggested_fix.copied": "✓ Copiado",
1547
+ "cot.attribution": "Referencias:",
1548
+ "inv.v082.cot": "<strong>📋 JSON CoT</strong> — lintea schemas de structured outputs buscando el anti-patrón respuesta-antes-de-razonamiento que silenciosamente rompe Chain-of-Thought.",
1549
+ "help.v082.cot.title": "📋 Linter JSON con consciencia CoT",
1550
+ "help.v082.cot.body": "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden que declara tu schema. Si escribes <code>{ answer, reasoning }</code> el modelo se compromete con <code>answer</code> primero y el CoT se reduce a justificación post-hoc. Pega cualquier schema (o respuesta de ejemplo) — el linter clasifica cada campo como <em>razonamiento</em>, <em>respuesta</em> u <em>otro</em>, señala el ordenamiento, y emite una corrección reordenada para copiar de vuelta. <em>Caso de uso</em>: 'Mi prompt CoT funciona en texto pero degrada en modo JSON' → ejecuta linter, encuentra el orden invertido, corrige.",
1551
+
1552
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
1553
  "help.v081.hub.title": "🧭 Solutions Hub",
1554
  "help.v081.hub.body": "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
 
2375
  "help.v08.saturation.title": "📈 Détecteur de saturation des benchmarks",
2376
  "help.v08.saturation.body": "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
2377
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
2378
+
2379
+ // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
2380
+ "modes.cot": "📋 JSON CoT",
2381
+ "mode_desc.cot": "Linte un JSON Schema (ou un objet de réponse exemple) à la recherche de l'anti-pattern réponse-avant-raisonnement. Les moteurs de décodage contraint émettent les champs dans l'ordre du schema — si `answer` précède `reasoning`, la chaîne de pensée est cassée.",
2382
+ "cot.title": "📋 Linter JSON conscient de CoT",
2383
+ "cot.tip": "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre du schema. Si votre schema place `answer` avant `reasoning`, le modèle s'engage sur la réponse finale en premier et n'écrit la justification qu'ensuite — détruisant la Chaîne de Pensée. Collez un JSON Schema (ou un objet exemple) et le linter signale l'ordre.",
2384
+ "cot.desc": "<strong>Le raisonnement avant la réponse, toujours.</strong> Collez un JSON Schema ou un objet de réponse exemple — le linter rapporte si les champs de raisonnement viennent avant ceux de réponse et propose une correction.",
2385
+ "cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
2386
+ "cot.lint_btn": "🔍 Linter",
2387
+ "cot.example_good_btn": "↳ Exemple : ordre correct",
2388
+ "cot.example_bad_btn": "↳ Exemple : anti-pattern",
2389
+ "cot.status.done": "✅ {verdict}",
2390
+ "cot.col.field": "Champ",
2391
+ "cot.col.type": "Rôle",
2392
+ "cot.field.reasoning": "raisonnement",
2393
+ "cot.field.answer": "réponse",
2394
+ "cot.field.other": "—",
2395
+ "cot.field_count": "{n} champs",
2396
+ "cot.verdict.good_order": "✅ Bon ordre — raisonnement avant réponse",
2397
+ "cot.verdict.anti_pattern": "❌ Anti-pattern — réponse avant raisonnement",
2398
+ "cot.verdict.missing_reasoning": "⚠ Champ de raisonnement manquant",
2399
+ "cot.verdict.missing_answer": "ℹ Aucun champ type réponse détecté",
2400
+ "cot.verdict.no_cot_fields": "ℹ Aucun champ raisonnement/réponse détecté",
2401
+ "cot.verdict.invalid_json": "❌ JSON invalide",
2402
+ "cot.verdict.non_object": "ℹ La valeur de premier niveau n'est pas un objet",
2403
+ "cot.verdict.empty_fields": "ℹ Aucun champ à analyser",
2404
+ "cot.explain.good_order": "Le décodage contraint émettra le raisonnement en premier, le modèle peut donc réfléchir avant de s'engager. La Chaîne de Pensée reste honnête.",
2405
+ "cot.explain.anti_pattern": "Le modèle est forcé d'émettre le champ de réponse en premier ; tout raisonnement qui suit ne fait que justifier ce qui est déjà engagé. Réordonnez pour que les champs raisonnement viennent avant les champs réponse.",
2406
+ "cot.explain.missing_reasoning": "Un champ de réponse est présent mais aucun champ de raisonnement. Si vous voulez du CoT, ajoutez un champ `reasoning` (ou `chain_of_thought`, `analysis`, …) <em>avant</em> la réponse.",
2407
+ "cot.explain.missing_answer": "Un champ de raisonnement est présent mais aucun champ de réponse évident. Vérifiez que le schema force réellement le modèle à s'engager sur une valeur finale.",
2408
+ "cot.explain.no_cot_fields": "L'objet a des champs, mais aucun ne ressemble à du raisonnement ou de la réponse par son nom. Le linter est conservateur — si le schema est intentionnel, ignorez. Sinon ajoutez des champs explicites raisonnement/réponse.",
2409
+ "cot.hint.non_object": "La valeur de premier niveau doit être un objet JSON (`{ … }`) ou un JSON Schema avec `properties`.",
2410
+ "cot.hint.empty_fields": "Aucun champ détecté. Collez un JSON Schema, une réponse exemple, ou cliquez un bouton d'exemple sous le textarea.",
2411
+ "cot.suggested_fix.title": "✓ Correction suggérée",
2412
+ "cot.suggested_fix.desc": "Propriétés réordonnées — champs raisonnement d'abord, puis tout champ de contexte, puis les champs réponse. `required[]` (s'il existe) est réordonné en correspondance.",
2413
+ "cot.suggested_fix.copy": "📋 Copier",
2414
+ "cot.suggested_fix.copied": "✓ Copié",
2415
+ "cot.attribution": "Réfs :",
2416
+ "inv.v082.cot": "<strong>📋 JSON CoT</strong> — linte les schemas de structured outputs à la recherche de l'anti-pattern réponse-avant-raisonnement qui casse silencieusement la Chaîne de Pensée.",
2417
+ "help.v082.cot.title": "📋 Linter JSON conscient de CoT",
2418
+ "help.v082.cot.body": "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre que votre schema déclare. Si vous écrivez <code>{ answer, reasoning }</code> le modèle s'engage sur <code>answer</code> en premier et le CoT se réduit à une justification a posteriori. Collez n'importe quel schema (ou réponse exemple) — le linter classe chaque champ comme <em>raisonnement</em>, <em>réponse</em> ou <em>autre</em>, signale l'ordre, et émet une correction réordonnée à copier. <em>Cas d'usage</em> : 'Mon prompt CoT marche en texte brut mais dégrade en mode JSON' → lancez le linter, trouvez l'ordre inversé, corrigez.",
2419
+
2420
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
2421
  "help.v081.hub.title": "🧭 Solutions Hub",
2422
  "help.v081.hub.body": "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
 
3243
  "help.v08.saturation.title": "📈 Benchmark 饱和度检测器",
3244
  "help.v08.saturation.body": "MMLU 已饱和(top 88-94%),AIME 2025 上线几个月就饱和,HumanEval 接近饱和。选任何 benchmark,工具返回 top-3 frontier 分数、spread、平均,以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品(例如 MMLU → MMLU-Pro / GPQA / HLE)。可达时从 DemandSphere AI Frontier Tracker(CC BY-NC 4.0)实时 fetch;不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>:在引用\"92% on MMLU\"或设计 eval 之前,检查 benchmark 是否仍能区分任何东西。",
3245
  "inv.v08.saturation": "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗,还是所有 frontier 都在顶部并列?",
3246
+
3247
+ // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
3248
+ "modes.cot": "📋 JSON CoT",
3249
+ "mode_desc.cot": "对 JSON Schema(或示例响应对象)进行 linting,查找『答案在推理之前』的反模式。约束解码引擎按 schema 顺序输出字段——如果 `answer` 在 `reasoning` 之前,CoT 就被破坏了。",
3250
+ "cot.title": "📋 JSON CoT 感知 Linter",
3251
+ "cot.tip": "约束解码引擎(llguidance、Outlines、SGLang 语法)按 schema 顺序输出 JSON 属性。如果 schema 把 `answer` 放在 `reasoning` 之前,模型会先承诺最终答案,然后才写理由来证明它——彻底破坏 Chain-of-Thought。粘贴 JSON Schema(或示例对象),linter 会标记顺序问题。",
3252
+ "cot.desc": "<strong>推理永远先于答案。</strong> 粘贴 JSON Schema 或示例响应对象——linter 报告推理字段是否在答案字段之前,并提出修复建议。",
3253
+ "cot.input.placeholder": "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
3254
+ "cot.lint_btn": "🔍 Lint",
3255
+ "cot.example_good_btn": "↳ 示例:正确顺序",
3256
+ "cot.example_bad_btn": "↳ 示例:反模式",
3257
+ "cot.status.done": "✅ {verdict}",
3258
+ "cot.col.field": "字段",
3259
+ "cot.col.type": "角色",
3260
+ "cot.field.reasoning": "推理",
3261
+ "cot.field.answer": "答案",
3262
+ "cot.field.other": "—",
3263
+ "cot.field_count": "{n} 个字段",
3264
+ "cot.verdict.good_order": "✅ 顺序正确——推理在答案之前",
3265
+ "cot.verdict.anti_pattern": "❌ 反模式——答案在推理之前",
3266
+ "cot.verdict.missing_reasoning": "⚠ 缺少推理字段",
3267
+ "cot.verdict.missing_answer": "ℹ 未检测到答案类字段",
3268
+ "cot.verdict.no_cot_fields": "ℹ 未检测到推理/答案字段",
3269
+ "cot.verdict.invalid_json": "❌ JSON 无效",
3270
+ "cot.verdict.non_object": "ℹ 顶层值不是对象",
3271
+ "cot.verdict.empty_fields": "ℹ 没有可分析的字段",
3272
+ "cot.explain.good_order": "约束解码会先输出推理,所以模型可以在承诺之前思考。Chain-of-Thought 保持诚实。",
3273
+ "cot.explain.anti_pattern": "模型被迫先输出答案字段;之后的任何推理只能为已承诺的内容辩护。重新排序,使推理类字段在答案类字段之前。",
3274
+ "cot.explain.missing_reasoning": "存在答案字段但没有推理字段。如果你想要 CoT,在答案<em>之前</em>添加 `reasoning`(或 `chain_of_thought`、`analysis`…)字段。",
3275
+ "cot.explain.missing_answer": "存在推理字段但没有明显的答案字段。确保 schema 实际上要求模型承诺一个最终值。",
3276
+ "cot.explain.no_cot_fields": "对象有字段但都不像推理或答案(按名称)。Linter 保守——如果 schema 是有意的,可以忽略。否则添加显式的推理/答案字段。",
3277
+ "cot.hint.non_object": "顶层值必须是 JSON 对象(`{ … }`)或带 `properties` 的 JSON Schema。",
3278
+ "cot.hint.empty_fields": "未检测到字段。粘贴 JSON Schema、示例响应,或点击 textarea 下方的示例按钮。",
3279
+ "cot.suggested_fix.title": "✓ 建议修复",
3280
+ "cot.suggested_fix.desc": "属性已重新排序——推理字段优先,然后是任何上下文字段,最后是答案字段。`required[]`(如果存在)也镜像同步。",
3281
+ "cot.suggested_fix.copy": "📋 复制",
3282
+ "cot.suggested_fix.copied": "✓ 已复制",
3283
+ "cot.attribution": "参考:",
3284
+ "inv.v082.cot": "<strong>📋 JSON CoT</strong> — 对 structured outputs schema 进行 linting,查找悄悄破坏 Chain-of-Thought 的『答案在推理之前』反模式。",
3285
+ "help.v082.cot.title": "📋 JSON CoT 感知 Linter",
3286
+ "help.v082.cot.body": "约束解码引擎(llguidance、Outlines、SGLang 语法)按 schema 声明的顺序输出 JSON 属性。如果你写 <code>{ answer, reasoning }</code>,模型先承诺 <code>answer</code>,CoT 就退化为事后辩护。粘贴任意 schema(或示例响应)——linter 把每个字段分类为<em>推理</em>、<em>答案</em>或<em>其他</em>,标记顺序,并输出可复制回去的重排修复。<em>用例</em>:『我的 CoT 提示在纯文本中正常但在 JSON 模式下退化』→ 运行 linter,找到颠倒的顺序,修复。",
3287
+
3288
  "inv.v081.hub": "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
3289
  "help.v081.hub.title": "🧭 Solutions Hub",
3290
  "help.v081.hub.body": "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别(评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性),每个映射到(a)解决它的 tafagent 模式(若存在),以及(b)社区已信任的最佳外部工具(RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等)。搜索框匹配 pain、场景和工具名称。<em>用例</em>:'我有问题 X — tafagent 解决它吗,如果不,谁解决?'",
js/json_cot_linter.js ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
2
+ //
3
+ // Pain (Solutions Hub `structured_outputs`): JSON schema engines fail
4
+ // silently and CoT models commit to the answer before reasoning when
5
+ // the schema places `answer` before `reasoning` — constrained decoding
6
+ // emits keys in property order, so the model has to commit a final
7
+ // answer first and only then writes the rationale to justify it,
8
+ // defeating Chain-of-Thought entirely.
9
+ //
10
+ // Source citations:
11
+ // - https://collinwilkins.com/articles/structured-output (field
12
+ // ordering anti-pattern explained)
13
+ // - JSONSchemaBench (10K real schemas) — most are not CoT-aware
14
+ // - llguidance / Outlines / SGLang grammars — all respect property order
15
+ //
16
+ // Pure logic — no human strings. Returns codes+params; main.js does
17
+ // the i18n lookup.
18
+
19
+ // Heuristic field classifiers. Tested against real schemas + examples
20
+ // in the smoke harness; conservative on `other` to avoid mislabeling
21
+ // ambiguous fields (e.g. a `score` could be either reasoning-side or
22
+ // answer-side, but lexically it patterns as answer-side and the
23
+ // false-anti-pattern cost is only "review the schema", which is fine).
24
+ const REASONING_PATTERNS = [
25
+ /reason/i,
26
+ /think/i,
27
+ /thought/i,
28
+ /\bcot\b/i,
29
+ /chain.of.thought/i,
30
+ /analysis/i,
31
+ /\bexplanation\b/i,
32
+ /rationale/i,
33
+ /step.by.step/i,
34
+ /scratchpad/i,
35
+ /justif/i,
36
+ /deliberat/i,
37
+ /\bplan\b/i,
38
+ /\bwhy\b/i,
39
+ ];
40
+
41
+ const ANSWER_PATTERNS = [
42
+ /^answer$/i,
43
+ /^result$/i,
44
+ /^output$/i,
45
+ /^response$/i,
46
+ /^final/i,
47
+ /^verdict$/i,
48
+ /^decision$/i,
49
+ /^prediction$/i,
50
+ /^conclusion$/i,
51
+ /^value$/i,
52
+ /^score$/i,
53
+ /^classif/i,
54
+ /^label$/i,
55
+ /^choice$/i,
56
+ /^selected/i,
57
+ ];
58
+
59
+ export function classifyFieldName(name) {
60
+ if (typeof name !== "string" || !name) return "other";
61
+ for (const pat of REASONING_PATTERNS) {
62
+ if (pat.test(name)) return "reasoning";
63
+ }
64
+ for (const pat of ANSWER_PATTERNS) {
65
+ if (pat.test(name)) return "answer";
66
+ }
67
+ return "other";
68
+ }
69
+
70
+ // Decide whether `parsed` is a JSON Schema (has `properties` / `$schema`
71
+ // / `type: object`) or a plain example object. Both have ordered keys
72
+ // in modern JS (ES2015+ insertion-order preservation for non-integer
73
+ // string keys), and constrained decoders honor that order, so the
74
+ // detection works on either form.
75
+ function extractFieldOrder(parsed) {
76
+ if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) {
77
+ return { kind: "non_object", fields: [] };
78
+ }
79
+ // Schema form
80
+ if (parsed.properties && typeof parsed.properties === "object") {
81
+ return { kind: "schema", fields: Object.keys(parsed.properties) };
82
+ }
83
+ // Example object form
84
+ return { kind: "example", fields: Object.keys(parsed) };
85
+ }
86
+
87
+ function buildFieldAnnotations(fields) {
88
+ return fields.map((name, idx) => ({
89
+ name,
90
+ idx,
91
+ type: classifyFieldName(name),
92
+ }));
93
+ }
94
+
95
+ function suggestReorder(annotations) {
96
+ // Strategy: keep relative order within each type bucket, but emit
97
+ // reasoning fields first, then `other`, then answer fields. That
98
+ // way CoT runs first, the model can reference any context fields,
99
+ // and the answer comes last (constrained decoding commits the
100
+ // answer after the rationale).
101
+ const reasoning = annotations.filter(a => a.type === "reasoning").map(a => a.name);
102
+ const other = annotations.filter(a => a.type === "other").map(a => a.name);
103
+ const answer = annotations.filter(a => a.type === "answer").map(a => a.name);
104
+ return [...reasoning, ...other, ...answer];
105
+ }
106
+
107
+ // Public entry point. `text` is the user-pasted JSON Schema or example.
108
+ // Returns { code, params } where `code` is one of:
109
+ // - invalid_json
110
+ // - non_object
111
+ // - empty_fields
112
+ // - good_order (reasoning before answer — CoT honored)
113
+ // - anti_pattern (answer before reasoning — model commits early)
114
+ // - missing_reasoning (answer-like fields present, no reasoning)
115
+ // - missing_answer (reasoning fields present, no answer-like field)
116
+ // - no_cot_fields (object has fields but none look reasoning/answer)
117
+ export function lintJsonCot(text) {
118
+ if (typeof text !== "string" || !text.trim()) {
119
+ return { code: "empty_fields", params: { reason: "empty_input" } };
120
+ }
121
+ let parsed;
122
+ try {
123
+ parsed = JSON.parse(text);
124
+ } catch (e) {
125
+ return {
126
+ code: "invalid_json",
127
+ params: { error: String(e && e.message || e).slice(0, 200) },
128
+ };
129
+ }
130
+ const { kind, fields } = extractFieldOrder(parsed);
131
+ if (kind === "non_object") {
132
+ return { code: "non_object", params: { kind: Array.isArray(parsed) ? "array" : typeof parsed } };
133
+ }
134
+ if (fields.length === 0) {
135
+ return { code: "empty_fields", params: { kind } };
136
+ }
137
+
138
+ const annotations = buildFieldAnnotations(fields);
139
+ const reasoningIdx = annotations.findIndex(a => a.type === "reasoning");
140
+ const answerIdx = annotations.findIndex(a => a.type === "answer");
141
+ const hasReasoning = reasoningIdx !== -1;
142
+ const hasAnswer = answerIdx !== -1;
143
+
144
+ const baseParams = {
145
+ kind,
146
+ fields: annotations,
147
+ field_count: annotations.length,
148
+ reasoning_idx: hasReasoning ? reasoningIdx : null,
149
+ answer_idx: hasAnswer ? answerIdx : null,
150
+ suggested_order: suggestReorder(annotations),
151
+ };
152
+
153
+ if (!hasReasoning && !hasAnswer) {
154
+ return { code: "no_cot_fields", params: baseParams };
155
+ }
156
+ if (hasReasoning && !hasAnswer) {
157
+ return { code: "missing_answer", params: baseParams };
158
+ }
159
+ if (!hasReasoning && hasAnswer) {
160
+ return { code: "missing_reasoning", params: baseParams };
161
+ }
162
+ // Both present — order is decisive.
163
+ if (reasoningIdx < answerIdx) {
164
+ return { code: "good_order", params: baseParams };
165
+ }
166
+ return { code: "anti_pattern", params: baseParams };
167
+ }
168
+
169
+ // Build a properties-reordered JSON string preserving the original
170
+ // shape (schema vs example). Used by the UI to show "suggested fix".
171
+ export function reorderJsonText(text, suggestedOrder) {
172
+ let parsed;
173
+ try { parsed = JSON.parse(text); }
174
+ catch { return null; }
175
+ if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) return null;
176
+
177
+ // Reorder properties within a plain object preserving values.
178
+ const reorderObj = (obj, order) => {
179
+ const out = {};
180
+ // First emit suggested keys that exist on the object.
181
+ for (const k of order) {
182
+ if (Object.prototype.hasOwnProperty.call(obj, k)) out[k] = obj[k];
183
+ }
184
+ // Then any keys not in the suggested order (defensive: keeps unknowns).
185
+ for (const k of Object.keys(obj)) {
186
+ if (!Object.prototype.hasOwnProperty.call(out, k)) out[k] = obj[k];
187
+ }
188
+ return out;
189
+ };
190
+
191
+ if (parsed.properties && typeof parsed.properties === "object") {
192
+ parsed.properties = reorderObj(parsed.properties, suggestedOrder);
193
+ // If `required` array exists, mirror suggested order so generators
194
+ // that emit fields in `required[]` order also benefit. Keep only
195
+ // the keys originally present in `required`.
196
+ if (Array.isArray(parsed.required)) {
197
+ const wasRequired = new Set(parsed.required);
198
+ parsed.required = suggestedOrder.filter(k => wasRequired.has(k));
199
+ }
200
+ return JSON.stringify(parsed, null, 2);
201
+ }
202
+ return JSON.stringify(reorderObj(parsed, suggestedOrder), null, 2);
203
+ }
js/main.js CHANGED
@@ -27,6 +27,7 @@ import {
27
  loadHub, listCategories, listEntries, searchEntries,
28
  hubStats, getCategoryMeta,
29
  } from "./solutions_hub.js";
 
30
 
31
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
32
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -216,6 +217,7 @@ document.addEventListener("click", (e) => {
216
  template: "template-section", arena: "arena-section", contam: "contam-section",
217
  quant: "quant-section", drift: "drift-section", niah: "niah-section",
218
  saturation: "saturation-section",
 
219
  hub: "hub-section",
220
  }[targetMode];
221
  if (sectionId) {
@@ -241,7 +243,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
241
  "diagnose-section", "phase-section", "unmask-section",
242
  "template-section", "arena-section", "contam-section",
243
  "quant-section", "drift-section", "niah-section",
244
- "saturation-section", "hub-section"].forEach(id => {
245
  const el = $(id);
246
  if (el) el.style.display = "none";
247
  });
@@ -253,6 +255,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
253
  template: "template-section", arena: "arena-section", contam: "contam-section",
254
  quant: "quant-section", drift: "drift-section", niah: "niah-section",
255
  saturation: "saturation-section",
 
256
  hub: "hub-section",
257
  };
258
  const sectionId = sectionMap[mode];
@@ -260,6 +263,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
260
  $("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
261
  if (mode === "phase") initPhaseDiagram();
262
  if (mode === "saturation") initSaturation();
 
263
  if (mode === "hub") initHub();
264
  });
265
  });
@@ -3384,6 +3388,173 @@ $("hub-clear-btn")?.addEventListener("click", () => {
3384
  renderHubAll();
3385
  });
3386
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3387
  // ════════════════════════════════════════════════════════════════════
3388
  // Bootstrap
3389
  // ════════════════════════════════════════════════════════════════════
 
27
  loadHub, listCategories, listEntries, searchEntries,
28
  hubStats, getCategoryMeta,
29
  } from "./solutions_hub.js";
30
+ import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
31
 
32
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
33
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
 
217
  template: "template-section", arena: "arena-section", contam: "contam-section",
218
  quant: "quant-section", drift: "drift-section", niah: "niah-section",
219
  saturation: "saturation-section",
220
+ cot: "cot-section",
221
  hub: "hub-section",
222
  }[targetMode];
223
  if (sectionId) {
 
243
  "diagnose-section", "phase-section", "unmask-section",
244
  "template-section", "arena-section", "contam-section",
245
  "quant-section", "drift-section", "niah-section",
246
+ "saturation-section", "cot-section", "hub-section"].forEach(id => {
247
  const el = $(id);
248
  if (el) el.style.display = "none";
249
  });
 
255
  template: "template-section", arena: "arena-section", contam: "contam-section",
256
  quant: "quant-section", drift: "drift-section", niah: "niah-section",
257
  saturation: "saturation-section",
258
+ cot: "cot-section",
259
  hub: "hub-section",
260
  };
261
  const sectionId = sectionMap[mode];
 
263
  $("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
264
  if (mode === "phase") initPhaseDiagram();
265
  if (mode === "saturation") initSaturation();
266
+ if (mode === "cot") initCot();
267
  if (mode === "hub") initHub();
268
  });
269
  });
 
3388
  renderHubAll();
3389
  });
3390
 
3391
+ // ════════════════════════════════════════════════════════════════════
3392
+ // 📋 JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
3393
+ // ════════════════════════════════════════════════════════════════════
3394
+ const COT_FIELD_TYPE_BADGE = {
3395
+ reasoning: "🧠",
3396
+ answer: "🎯",
3397
+ other: "·",
3398
+ };
3399
+
3400
+ const COT_VERDICT_BADGE_BG = {
3401
+ good_order: "#3fb950", // green
3402
+ anti_pattern: "#f85149", // red
3403
+ missing_reasoning: "#d29922", // amber
3404
+ missing_answer: "#d29922", // amber
3405
+ no_cot_fields: "#8b949e", // gray
3406
+ non_object: "#8b949e",
3407
+ empty_fields: "#8b949e",
3408
+ invalid_json: "#f85149", // red
3409
+ };
3410
+
3411
+ let __cotInited = false;
3412
+
3413
+ function initCot() {
3414
+ if (__cotInited) return;
3415
+ __cotInited = true;
3416
+ // No-op (no async data); placeholder kept for symmetry with other modes.
3417
+ }
3418
+
3419
+ function renderCotResult(result, originalText) {
3420
+ const verdict = t(`cot.verdict.${result.code}`) || result.code;
3421
+ const verdictBg = COT_VERDICT_BADGE_BG[result.code] || "#8b949e";
3422
+ const verdictBadge = `<span class="badge" style="background:${verdictBg};">${verdict}</span>`;
3423
+
3424
+ // Failure cases short-circuit: just show the verdict + reason.
3425
+ if (result.code === "invalid_json") {
3426
+ const reason = result.params?.error || "";
3427
+ return `<div class="arena-result">
3428
+ <p style="font-size:1.1em;">${verdictBadge}</p>
3429
+ <pre style="background:#21262d;padding:0.75em;border-radius:4px;color:#f0883e;">${escapeHtml(reason)}</pre>
3430
+ </div>`;
3431
+ }
3432
+ if (result.code === "empty_fields" || result.code === "non_object") {
3433
+ return `<div class="arena-result">
3434
+ <p style="font-size:1.1em;">${verdictBadge}</p>
3435
+ <p class="recipe-desc">${t(`cot.hint.${result.code}`) || ""}</p>
3436
+ </div>`;
3437
+ }
3438
+
3439
+ const fields = result.params?.fields || [];
3440
+ const fieldRows = fields.map(f => {
3441
+ const icon = COT_FIELD_TYPE_BADGE[f.type] || "·";
3442
+ const typeLabel = t(`cot.field.${f.type}`) || f.type;
3443
+ const color = f.type === "reasoning" ? "#3fb950"
3444
+ : f.type === "answer" ? "#f0883e"
3445
+ : "#8b949e";
3446
+ return `<tr>
3447
+ <td style="text-align:right;color:#8b949e;">${f.idx}</td>
3448
+ <td><code>${escapeHtml(f.name)}</code></td>
3449
+ <td><span style="color:${color};">${icon} ${typeLabel}</span></td>
3450
+ </tr>`;
3451
+ }).join("");
3452
+ const fieldTable = `
3453
+ <table class="lean-table" style="margin-top:0.5em;">
3454
+ <thead><tr>
3455
+ <th>#</th>
3456
+ <th data-i18n="cot.col.field">Field</th>
3457
+ <th data-i18n="cot.col.type">Type</th>
3458
+ </tr></thead>
3459
+ <tbody>${fieldRows}</tbody>
3460
+ </table>
3461
+ `;
3462
+
3463
+ // Suggested-fix block — only when there's a meaningful reorder.
3464
+ let fixBlock = "";
3465
+ if (result.code === "anti_pattern") {
3466
+ const suggested = result.params?.suggested_order || [];
3467
+ const fixed = reorderJsonText(originalText, suggested);
3468
+ if (fixed) {
3469
+ fixBlock = `
3470
+ <details open style="margin-top:1em;">
3471
+ <summary style="cursor:pointer;color:#3fb950;">
3472
+ <strong>${t("cot.suggested_fix.title") || "✓ Suggested fix"}</strong>
3473
+ </summary>
3474
+ <p class="recipe-desc">${t("cot.suggested_fix.desc") || ""}</p>
3475
+ <pre style="background:#0d1117;padding:0.75em;border-radius:4px;overflow-x:auto;"><code>${escapeHtml(fixed)}</code></pre>
3476
+ <button type="button" class="secondary" onclick="navigator.clipboard.writeText(this.previousElementSibling.textContent).then(()=>{this.textContent='${t("cot.suggested_fix.copied") || "✓ Copied"}';setTimeout(()=>{this.textContent='${t("cot.suggested_fix.copy") || "📋 Copy"}';},1500);})">${t("cot.suggested_fix.copy") || "📋 Copy"}</button>
3477
+ </details>
3478
+ `;
3479
+ }
3480
+ }
3481
+
3482
+ // Verdict explainer
3483
+ const explainer = t(`cot.explain.${result.code}`) || "";
3484
+ const explainerBlock = explainer
3485
+ ? `<p class="recipe-desc">${explainer}</p>`
3486
+ : "";
3487
+
3488
+ // Source attribution footer
3489
+ const attribution = `
3490
+ <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
3491
+ ${t("cot.attribution") || ""}
3492
+ <a href="https://collinwilkins.com/articles/structured-output" target="_blank" rel="noopener noreferrer">collinwilkins.com</a> ·
3493
+ <a href="https://github.com/guidance-ai/jsonschemabench" target="_blank" rel="noopener noreferrer">JSONSchemaBench</a> ·
3494
+ <a href="https://github.com/guidance-ai/llguidance" target="_blank" rel="noopener noreferrer">llguidance</a>
3495
+ </p>
3496
+ `;
3497
+
3498
+ return `<div class="arena-result">
3499
+ <p style="font-size:1.1em;">${verdictBadge}
3500
+ <span class="subtle" style="font-size:0.9em;">(${tFmt("cot.field_count", { n: result.params.field_count }) || `${result.params.field_count} fields`})</span>
3501
+ </p>
3502
+ ${explainerBlock}
3503
+ ${fieldTable}
3504
+ ${fixBlock}
3505
+ ${attribution}
3506
+ </div>`;
3507
+ }
3508
+
3509
+ function runCotLint() {
3510
+ const text = $("cot-input")?.value || "";
3511
+ const result = lintJsonCot(text);
3512
+ $("cot-output").innerHTML = renderCotResult(result, text);
3513
+ $("cot-status").textContent = tFmt("cot.status.done", {
3514
+ verdict: t(`cot.verdict.${result.code}`) || result.code,
3515
+ });
3516
+ }
3517
+
3518
+ const COT_EXAMPLE_GOOD = JSON.stringify({
3519
+ type: "object",
3520
+ properties: {
3521
+ reasoning: {
3522
+ type: "string",
3523
+ description: "Step-by-step rationale before committing to an answer.",
3524
+ },
3525
+ answer: {
3526
+ type: "string",
3527
+ description: "Final answer, derived from the reasoning above.",
3528
+ },
3529
+ },
3530
+ required: ["reasoning", "answer"],
3531
+ }, null, 2);
3532
+
3533
+ const COT_EXAMPLE_BAD = JSON.stringify({
3534
+ type: "object",
3535
+ properties: {
3536
+ final_answer: {
3537
+ type: "string",
3538
+ description: "The model's final answer.",
3539
+ },
3540
+ chain_of_thought: {
3541
+ type: "string",
3542
+ description: "Justification for the answer above.",
3543
+ },
3544
+ },
3545
+ required: ["final_answer", "chain_of_thought"],
3546
+ }, null, 2);
3547
+
3548
+ $("cot-lint-btn")?.addEventListener("click", runCotLint);
3549
+ $("cot-example-good-btn")?.addEventListener("click", () => {
3550
+ $("cot-input").value = COT_EXAMPLE_GOOD;
3551
+ runCotLint();
3552
+ });
3553
+ $("cot-example-bad-btn")?.addEventListener("click", () => {
3554
+ $("cot-input").value = COT_EXAMPLE_BAD;
3555
+ runCotLint();
3556
+ });
3557
+
3558
  // ════════════════════════════════════════════════════════════════════
3559
  // Bootstrap
3560
  // ════════════════════════════════════════════════════════════════════