Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on May 7

Commit

b0f51d4

1 Parent(s): fbf3edc

v0.8.2 JSON CoT-aware Linter — anti-bullshit pack #8

Constrained-decoding engines (llguidance, Outlines, SGLang grammars)
emit JSON properties in the order your schema declares them. If a
schema places `answer` before `reasoning`, the model commits to the
final answer first and the rationale that follows can only justify
what was already committed — defeating Chain-of-Thought entirely.

📋 JSON CoT Linter (16th mode):
- Paste any JSON Schema or example response object
- Linter classifies each field as reasoning / answer / other via
name patterns (reason|think|thought|cot|chain.of.thought|analysis|
explanation|rationale|… vs answer|result|verdict|final_answer|…)
- Verdict codes: good_order / anti_pattern / missing_reasoning /
missing_answer / no_cot_fields / invalid_json / non_object / empty
- Suggested-fix block emits a reordered schema (reasoning → other →
answer) with `required[]` mirrored to match — copy back into prompt

Pure logic in `js/json_cot_linter.js` (codes + params, no human
strings); main.js renders with i18n. 39 i18n keys × 4 langs (EN/ES/FR/
ZH) = 156 keys, parity clean. Solutions Hub `structured_outputs` pain
upgraded from `null` → `📋 JSON CoT-aware Linter` (planned: → covered).
Help modal v0.8.2 entry + Inventory anti-bullshit-pack list updated +
task tile "⚙️ Set up an eval correctly" gains the new mode button.

Source citations:
- https://collinwilkins.com/articles/structured-output (the bug)
- https://github.com/guidance-ai/jsonschemabench (10K real schemas)
- https://github.com/guidance-ai/llguidance (constrained decoder)

Verified: 10/10 lint cases + reorder roundtrip + headless e2e (tab
present, section toggles, bad/good examples render verdict + fields +
suggested fix, manual paste detects anti-pattern, invalid JSON shows
error). 17 mode tabs total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show

data/solutions_hub.json +1 -1
index.html +28 -0
js/i18n.js +168 -0
js/json_cot_linter.js +203 -0
js/main.js +172 -1

data/solutions_hub.json CHANGED Viewed

@@ -183,7 +183,7 @@
       "id": "structured_outputs",
       "category": "setup",
       "pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
-      "tafagent_mode": null,
       "external_tools": [
         {"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
         {"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},

       "id": "structured_outputs",
       "category": "setup",
       "pain": "JSON schema engines fail silently; CoT models commit to answer before reasoning.",
+      "tafagent_mode": "📋 JSON CoT-aware Linter",
       "external_tools": [
         {"name": "llguidance (constrained decoding)", "url": "https://github.com/guidance-ai/llguidance", "type": "tool"},
         {"name": "Outlines", "url": "https://github.com/dottxt-ai/outlines", "type": "tool"},

index.html CHANGED Viewed

@@ -216,6 +216,9 @@
       <p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
       <p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
@@ -328,6 +331,7 @@
             <li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
             <li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
             <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
@@ -399,6 +403,7 @@
           <div class="tile-modes">
             <button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
             <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
           </div>
         </div>
         <div class="task-tile">
@@ -455,6 +460,7 @@
         <button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
         <button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
         <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
@@ -1004,6 +1010,28 @@
     </section>
     <!-- Solutions Hub — integrator portal (v0.8.1) -->
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

       <p><strong data-i18n="help.v08.saturation.title">📈 Benchmark Saturation Detector</strong></p>
       <p data-i18n="help.v08.saturation.body">MMLU is saturated (top 88-94%), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.</p>
+      <p><strong data-i18n="help.v082.cot.title">📋 JSON CoT-aware Linter</strong></p>
+      <p data-i18n="help.v082.cot.body">Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.</p>
       <p><strong data-i18n="help.v081.hub.title">🧭 Solutions Hub</strong></p>
       <p data-i18n="help.v081.hub.body">tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'</p>
             <li data-i18n="inv.v07.drift"><strong>🔀 Drift</strong> — bug or noise? Predict max admissible gap between two evals</li>
             <li data-i18n="inv.v07.niah"><strong>🔍 NIAH→Reason</strong> — does your "128k context" actually reason there, or just retrieve?</li>
             <li data-i18n="inv.v08.saturation"><strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?</li>
+            <li data-i18n="inv.v082.cot"><strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.</li>
             <li data-i18n="inv.v081.hub"><strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.</li>
           </ul>
         </details>
           <div class="tile-modes">
             <button data-mode-link="template" data-i18n="modes.template">📜 Chat-template</button>
             <button data-mode-link="diagnose" data-i18n="modes.diagnose">🩺 Diagnose CLI</button>
+            <button data-mode-link="cot" data-i18n="modes.cot">📋 JSON CoT</button>
           </div>
         </div>
         <div class="task-tile">
         <button class="mode-btn" data-mode="drift" role="tab" aria-selected="false" data-i18n="modes.drift">🔀 Drift</button>
         <button class="mode-btn" data-mode="niah" role="tab" aria-selected="false" data-i18n="modes.niah">🔍 NIAH→Reason</button>
         <button class="mode-btn" data-mode="saturation" role="tab" aria-selected="false" data-i18n="modes.saturation">📈 Saturation</button>
+        <button class="mode-btn" data-mode="cot" role="tab" aria-selected="false" data-i18n="modes.cot">📋 JSON CoT</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
     </section>
     <!-- Solutions Hub — integrator portal (v0.8.1) -->
+    <!-- JSON CoT-aware Linter (mode=cot, v0.8.2 anti-bullshit pack #8) -->
+    <section id="cot-section" style="display:none;">
+      <h2><span data-i18n="cot.title">📋 JSON CoT-aware Linter</span>
+        <span class="info"><span class="tooltip" data-i18n="cot.tip">
+          <strong>Why this matters</strong>: constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places <code>answer</code> before <code>reasoning</code>, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.
+        </span></span>
+      </h2>
+      <p class="recipe-desc" data-i18n="cot.desc">
+        <strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.
+      </p>
+      <div class="form-row">
+        <textarea id="cot-input" rows="10" style="width:100%;font-family:monospace;font-size:0.9em;" data-i18n-placeholder="cot.input.placeholder" placeholder='{ "type": "object", "properties": { "answer": {"type": "string"}, "reasoning": {"type": "string"} } }'></textarea>
+      </div>
+      <div class="form-row">
+        <button type="button" id="cot-lint-btn" data-i18n="cot.lint_btn">🔍 Lint</button>
+        <button type="button" id="cot-example-good-btn" class="secondary" data-i18n="cot.example_good_btn">↳ Example: good order</button>
+        <button type="button" id="cot-example-bad-btn" class="secondary" data-i18n="cot.example_bad_btn">↳ Example: anti-pattern</button>
+      </div>
+      <p id="cot-status" class="recipe-desc" style="font-size:0.92em;"></p>
+      <div id="cot-output" style="margin-top: 1em;"></div>
+    </section>
     <section id="hub-section" style="display:none;">
       <h2><span data-i18n="hub.title">🧭 Solutions Hub</span>
         <span class="info"><span class="tooltip" data-i18n="hub.tip">

js/i18n.js CHANGED Viewed

@@ -503,6 +503,48 @@ export const TRANSLATIONS = {
     "help.v08.saturation.title":   "📈 Benchmark Saturation Detector",
     "help.v08.saturation.body":    "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
@@ -1465,6 +1507,48 @@ export const TRANSLATIONS = {
     "help.v08.saturation.title":   "📈 Detector de saturación de benchmarks",
     "help.v08.saturation.body":    "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
@@ -2291,6 +2375,48 @@ export const TRANSLATIONS = {
     "help.v08.saturation.title":   "📈 Détecteur de saturation des benchmarks",
     "help.v08.saturation.body":    "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
@@ -3117,6 +3243,48 @@ export const TRANSLATIONS = {
     "help.v08.saturation.title":   "📈 Benchmark 饱和度检测器",
     "help.v08.saturation.body":    "MMLU 已饱和（top 88-94%），AIME 2025 上线几个月就饱和，HumanEval 接近饱和。选任何 benchmark，工具返回 top-3 frontier 分数、spread、平均，以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品（例如 MMLU → MMLU-Pro / GPQA / HLE）。可达时从 DemandSphere AI Frontier Tracker（CC BY-NC 4.0）实时 fetch；不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>：在引用\"92% on MMLU\"或设计 eval 之前，检查 benchmark 是否仍能区分任何东西。",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗，还是所有 frontier 都在顶部并列？",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

     "help.v08.saturation.title":   "📈 Benchmark Saturation Detector",
     "help.v08.saturation.body":    "MMLU is saturated (88-94% top), AIME 2025 saturated within months of release, HumanEval near-saturated. Pick any benchmark and the tool returns top-3 frontier scores, spread, mean, and a verdict — saturated / near-saturated / discriminative — plus a recommended replacement (e.g. MMLU → MMLU-Pro / GPQA / HLE). Live fetch from DemandSphere AI Frontier Tracker (CC BY-NC 4.0) when reachable; baked 2026-05-05 snapshot when not. <em>Use case</em>: before you cite '92% on MMLU' or design an eval, check whether the benchmark still discriminates anything.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — is your benchmark still useful, or are all frontier models tied at the top?",
+    // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
+    "modes.cot":                   "📋 JSON CoT",
+    "mode_desc.cot":               "Lints a JSON Schema (or example response object) for the answer-before-reasoning anti-pattern. Constrained-decoding engines emit fields in property order — if `answer` comes before `reasoning`, CoT is defeated.",
+    "cot.title":                   "📋 JSON CoT-aware Linter",
+    "cot.tip":                     "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in schema order. If your schema places `answer` before `reasoning`, the model commits to a final answer first and only then writes the rationale to justify it — defeating Chain-of-Thought entirely. Paste a JSON Schema (or example object) and the linter flags the ordering.",
+    "cot.desc":                    "<strong>Reasoning before answer, always.</strong> Paste a JSON Schema or example response object — the linter reports whether reasoning fields come before answer fields and suggests a fix.",
+    "cot.input.placeholder":       "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
+    "cot.lint_btn":                "🔍 Lint",
+    "cot.example_good_btn":        "↳ Example: good order",
+    "cot.example_bad_btn":         "↳ Example: anti-pattern",
+    "cot.status.done":             "✅ {verdict}",
+    "cot.col.field":               "Field",
+    "cot.col.type":                "Role",
+    "cot.field.reasoning":         "reasoning",
+    "cot.field.answer":            "answer",
+    "cot.field.other":             "—",
+    "cot.field_count":             "{n} fields",
+    "cot.verdict.good_order":          "✅ Good order — reasoning before answer",
+    "cot.verdict.anti_pattern":        "❌ Anti-pattern — answer before reasoning",
+    "cot.verdict.missing_reasoning":   "⚠ Missing reasoning field",
+    "cot.verdict.missing_answer":      "ℹ No answer-like field detected",
+    "cot.verdict.no_cot_fields":       "ℹ No reasoning/answer fields detected",
+    "cot.verdict.invalid_json":        "❌ Invalid JSON",
+    "cot.verdict.non_object":          "ℹ Top-level value is not an object",
+    "cot.verdict.empty_fields":        "ℹ No fields to analyse",
+    "cot.explain.good_order":          "Constrained decoding will emit the rationale first, so the model can think before committing. Chain-of-Thought stays honest.",
+    "cot.explain.anti_pattern":        "The model is forced to emit the answer field first; any reasoning that follows can only justify what was already committed. Reorder so reasoning-like fields come before answer-like fields.",
+    "cot.explain.missing_reasoning":   "An answer field is present but no reasoning field. If you want CoT, add a `reasoning` (or `chain_of_thought`, `analysis`, …) field <em>before</em> the answer.",
+    "cot.explain.missing_answer":      "A reasoning field is present but no obvious answer field. Make sure the schema actually requires the model to commit a final value.",
+    "cot.explain.no_cot_fields":       "Object has fields, but none look reasoning- or answer-like by name. The linter is conservative — if the schema is intentional, ignore. Otherwise add explicit reasoning/answer fields.",
+    "cot.hint.non_object":             "Top-level must be a JSON object (`{ … }`) or a JSON Schema with `properties`.",
+    "cot.hint.empty_fields":           "No fields detected. Paste a JSON Schema, an example response, or click an example button below the textarea.",
+    "cot.suggested_fix.title":         "✓ Suggested fix",
+    "cot.suggested_fix.desc":          "Reordered properties — reasoning fields first, then any context fields, then answer fields. `required[]` (if present) is mirrored to match.",
+    "cot.suggested_fix.copy":          "📋 Copy",
+    "cot.suggested_fix.copied":        "✓ Copied",
+    "cot.attribution":              "Refs:",
+    "inv.v082.cot":                "<strong>📋 JSON CoT</strong> — lints structured-output schemas for the answer-before-reasoning anti-pattern that silently breaks Chain-of-Thought.",
+    "help.v082.cot.title":         "📋 JSON CoT-aware Linter",
+    "help.v082.cot.body":          "Constrained-decoding engines (llguidance, Outlines, SGLang grammars) emit JSON properties in the order your schema declares them. If you write <code>{ answer, reasoning }</code> the model commits to <code>answer</code> first and CoT collapses into post-hoc justification. Paste any schema (or example response) — the linter classifies each field as <em>reasoning</em>, <em>answer</em>, or <em>other</em>, flags the ordering, and emits a reordered fix you can copy back. <em>Use case</em>: 'My CoT prompt works in plaintext but degrades under JSON mode' → run linter, find the inverted order, fix.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — every documented pain mapped to a tafagent mode or curated external tool. Don't reinvent — find.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent as integrator, not silo. 30+ pains across 7 categories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), each mapped to (a) the tafagent mode that addresses it, if any, and (b) the best-of-breed external tools the community already trusts (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Search box matches across pain, scenario, and tool name. <em>Use case</em>: 'I have problem X — does tafagent solve it, and if not, who does?'",
     "help.v08.saturation.title":   "📈 Detector de saturación de benchmarks",
     "help.v08.saturation.body":    "MMLU está saturado (top 88-94%), AIME 2025 saturó a los pocos meses de salir, HumanEval near-saturated. Elige cualquier benchmark y la herramienta retorna top-3 frontier scores, spread, media, y un veredicto — saturated / near-saturated / discriminative — más un reemplazo recomendado (ej. MMLU → MMLU-Pro / GPQA / HLE). Fetch en vivo desde DemandSphere AI Frontier Tracker (CC BY-NC 4.0) cuando llega; snapshot baked 2026-05-05 cuando no. <em>Caso de uso</em>: antes de citar '92% en MMLU' o diseñar una eval, verifica si el benchmark aún discrimina algo.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — ¿sigue siendo útil tu benchmark, o están todos los frontiers empatados arriba?",
+    // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
+    "modes.cot":                   "📋 JSON CoT",
+    "mode_desc.cot":               "Lintea un JSON Schema (o ejemplo de respuesta) buscando el anti-patrón respuesta-antes-de-razonamiento. Los motores de constrained decoding emiten campos en el orden del schema — si `answer` va antes que `reasoning`, el CoT se rompe.",
+    "cot.title":                   "📋 Linter JSON con consciencia CoT",
+    "cot.tip":                     "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden del schema. Si tu schema pone `answer` antes de `reasoning`, el modelo se compromete con la respuesta final primero y solo después escribe el razonamiento para justificarla — rompiendo Chain-of-Thought por completo. Pega un JSON Schema (o objeto de ejemplo) y el linter señala el ordenamiento.",
+    "cot.desc":                    "<strong>Razonamiento antes que respuesta, siempre.</strong> Pega un JSON Schema o un objeto de respuesta de ejemplo — el linter dice si los campos de razonamiento van antes que los de respuesta y propone una corrección.",
+    "cot.input.placeholder":       "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
+    "cot.lint_btn":                "🔍 Lintear",
+    "cot.example_good_btn":        "↳ Ejemplo: orden correcto",
+    "cot.example_bad_btn":         "↳ Ejemplo: anti-patrón",
+    "cot.status.done":             "✅ {verdict}",
+    "cot.col.field":               "Campo",
+    "cot.col.type":                "Rol",
+    "cot.field.reasoning":         "razonamiento",
+    "cot.field.answer":            "respuesta",
+    "cot.field.other":             "—",
+    "cot.field_count":             "{n} campos",
+    "cot.verdict.good_order":          "✅ Orden correcto — razonamiento antes que respuesta",
+    "cot.verdict.anti_pattern":        "❌ Anti-patrón — respuesta antes que razonamiento",
+    "cot.verdict.missing_reasoning":   "⚠ Falta campo de razonamiento",
+    "cot.verdict.missing_answer":      "ℹ No se detecta campo tipo respuesta",
+    "cot.verdict.no_cot_fields":       "ℹ Sin campos de razonamiento/respuesta detectados",
+    "cot.verdict.invalid_json":        "❌ JSON inválido",
+    "cot.verdict.non_object":          "ℹ El valor superior no es un objeto",
+    "cot.verdict.empty_fields":        "ℹ Sin campos para analizar",
+    "cot.explain.good_order":          "El constrained decoding emitirá el razonamiento primero, así el modelo puede pensar antes de comprometerse. Chain-of-Thought se mantiene honesto.",
+    "cot.explain.anti_pattern":        "El modelo se ve forzado a emitir el campo de respuesta primero; cualquier razonamiento posterior solo justifica lo ya comprometido. Reordena para que los campos tipo razonamiento vayan antes que los tipo respuesta.",
+    "cot.explain.missing_reasoning":   "Hay un campo de respuesta pero ningún campo de razonamiento. Si quieres CoT, añade un campo `reasoning` (o `chain_of_thought`, `analysis`, …) <em>antes</em> de la respuesta.",
+    "cot.explain.missing_answer":      "Hay un campo de razonamiento pero ningún campo de respuesta evidente. Asegúrate de que el schema realmente exija al modelo comprometer un valor final.",
+    "cot.explain.no_cot_fields":       "El objeto tiene campos pero ninguno se ve como razonamiento o respuesta por su nombre. El linter es conservador — si el schema es intencional, ignóralo. Si no, añade campos explícitos de razonamiento/respuesta.",
+    "cot.hint.non_object":             "El valor de nivel superior debe ser un objeto JSON (`{ … }`) o un JSON Schema con `properties`.",
+    "cot.hint.empty_fields":           "Sin campos detectados. Pega un JSON Schema, una respuesta de ejemplo, o pulsa un botón de ejemplo bajo el textarea.",
+    "cot.suggested_fix.title":         "✓ Corrección sugerida",
+    "cot.suggested_fix.desc":          "Propiedades reordenadas — campos de razonamiento primero, luego cualquier campo de contexto, luego los de respuesta. `required[]` (si existe) se reordena igual.",
+    "cot.suggested_fix.copy":          "📋 Copiar",
+    "cot.suggested_fix.copied":        "✓ Copiado",
+    "cot.attribution":              "Referencias:",
+    "inv.v082.cot":                "<strong>📋 JSON CoT</strong> — lintea schemas de structured outputs buscando el anti-patrón respuesta-antes-de-razonamiento que silenciosamente rompe Chain-of-Thought.",
+    "help.v082.cot.title":         "📋 Linter JSON con consciencia CoT",
+    "help.v082.cot.body":          "Los motores de constrained decoding (llguidance, Outlines, gramáticas SGLang) emiten propiedades JSON en el orden que declara tu schema. Si escribes <code>{ answer, reasoning }</code> el modelo se compromete con <code>answer</code> primero y el CoT se reduce a justificación post-hoc. Pega cualquier schema (o respuesta de ejemplo) — el linter clasifica cada campo como <em>razonamiento</em>, <em>respuesta</em> u <em>otro</em>, señala el ordenamiento, y emite una corrección reordenada para copiar de vuelta. <em>Caso de uso</em>: 'Mi prompt CoT funciona en texto pero degrada en modo JSON' → ejecuta linter, encuentra el orden invertido, corrige.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — cada pain documentado mapeado a un mode tafagent o herramienta externa curada. No reinventes — encuentra.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent como integrador, no silo. 30+ pains en 7 categorías (eval reliability · diagnósticos · setup · training · retrieval · multimodal · observability), cada uno mapeado a (a) el mode tafagent que lo resuelve, si existe, y (b) las herramientas externas best-of-breed que la comunidad ya usa (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). Caja de búsqueda matchea pain, scenario, y nombre de herramienta. <em>Caso de uso</em>: 'tengo problema X — ¿lo resuelve tafagent, y si no, quién?'",
     "help.v08.saturation.title":   "📈 Détecteur de saturation des benchmarks",
     "help.v08.saturation.body":    "MMLU est saturé (top 88-94%), AIME 2025 saturé en quelques mois après sa sortie, HumanEval presque saturé. Choisissez un benchmark et l'outil retourne top-3 frontier scores, spread, moyenne, et un verdict — saturated / near-saturated / discriminative — plus un remplacement recommandé (ex. MMLU → MMLU-Pro / GPQA / HLE). Fetch en direct depuis DemandSphere AI Frontier Tracker (CC BY-NC 4.0) si accessible ; snapshot baked 2026-05-05 sinon. <em>Cas d'usage</em> : avant de citer '92% sur MMLU' ou de concevoir une eval, vérifiez si le benchmark discrimine encore quelque chose.",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — votre benchmark est-il encore utile, ou tous les frontiers sont-ils à égalité au sommet ?",
+    // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
+    "modes.cot":                   "📋 JSON CoT",
+    "mode_desc.cot":               "Linte un JSON Schema (ou un objet de réponse exemple) à la recherche de l'anti-pattern réponse-avant-raisonnement. Les moteurs de décodage contraint émettent les champs dans l'ordre du schema — si `answer` précède `reasoning`, la chaîne de pensée est cassée.",
+    "cot.title":                   "📋 Linter JSON conscient de CoT",
+    "cot.tip":                     "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre du schema. Si votre schema place `answer` avant `reasoning`, le modèle s'engage sur la réponse finale en premier et n'écrit la justification qu'ensuite — détruisant la Chaîne de Pensée. Collez un JSON Schema (ou un objet exemple) et le linter signale l'ordre.",
+    "cot.desc":                    "<strong>Le raisonnement avant la réponse, toujours.</strong> Collez un JSON Schema ou un objet de réponse exemple — le linter rapporte si les champs de raisonnement viennent avant ceux de réponse et propose une correction.",
+    "cot.input.placeholder":       "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
+    "cot.lint_btn":                "🔍 Linter",
+    "cot.example_good_btn":        "↳ Exemple : ordre correct",
+    "cot.example_bad_btn":         "↳ Exemple : anti-pattern",
+    "cot.status.done":             "✅ {verdict}",
+    "cot.col.field":               "Champ",
+    "cot.col.type":                "Rôle",
+    "cot.field.reasoning":         "raisonnement",
+    "cot.field.answer":            "réponse",
+    "cot.field.other":             "—",
+    "cot.field_count":             "{n} champs",
+    "cot.verdict.good_order":          "✅ Bon ordre — raisonnement avant réponse",
+    "cot.verdict.anti_pattern":        "❌ Anti-pattern — réponse avant raisonnement",
+    "cot.verdict.missing_reasoning":   "⚠ Champ de raisonnement manquant",
+    "cot.verdict.missing_answer":      "ℹ Aucun champ type réponse détecté",
+    "cot.verdict.no_cot_fields":       "ℹ Aucun champ raisonnement/réponse détecté",
+    "cot.verdict.invalid_json":        "❌ JSON invalide",
+    "cot.verdict.non_object":          "ℹ La valeur de premier niveau n'est pas un objet",
+    "cot.verdict.empty_fields":        "ℹ Aucun champ à analyser",
+    "cot.explain.good_order":          "Le décodage contraint émettra le raisonnement en premier, le modèle peut donc réfléchir avant de s'engager. La Chaîne de Pensée reste honnête.",
+    "cot.explain.anti_pattern":        "Le modèle est forcé d'émettre le champ de réponse en premier ; tout raisonnement qui suit ne fait que justifier ce qui est déjà engagé. Réordonnez pour que les champs raisonnement viennent avant les champs réponse.",
+    "cot.explain.missing_reasoning":   "Un champ de réponse est présent mais aucun champ de raisonnement. Si vous voulez du CoT, ajoutez un champ `reasoning` (ou `chain_of_thought`, `analysis`, …) <em>avant</em> la réponse.",
+    "cot.explain.missing_answer":      "Un champ de raisonnement est présent mais aucun champ de réponse évident. Vérifiez que le schema force réellement le modèle à s'engager sur une valeur finale.",
+    "cot.explain.no_cot_fields":       "L'objet a des champs, mais aucun ne ressemble à du raisonnement ou de la réponse par son nom. Le linter est conservateur — si le schema est intentionnel, ignorez. Sinon ajoutez des champs explicites raisonnement/réponse.",
+    "cot.hint.non_object":             "La valeur de premier niveau doit être un objet JSON (`{ … }`) ou un JSON Schema avec `properties`.",
+    "cot.hint.empty_fields":           "Aucun champ détecté. Collez un JSON Schema, une réponse exemple, ou cliquez un bouton d'exemple sous le textarea.",
+    "cot.suggested_fix.title":         "✓ Correction suggérée",
+    "cot.suggested_fix.desc":          "Propriétés réordonnées — champs raisonnement d'abord, puis tout champ de contexte, puis les champs réponse. `required[]` (s'il existe) est réordonné en correspondance.",
+    "cot.suggested_fix.copy":          "📋 Copier",
+    "cot.suggested_fix.copied":        "✓ Copié",
+    "cot.attribution":              "Réfs :",
+    "inv.v082.cot":                "<strong>📋 JSON CoT</strong> — linte les schemas de structured outputs à la recherche de l'anti-pattern réponse-avant-raisonnement qui casse silencieusement la Chaîne de Pensée.",
+    "help.v082.cot.title":         "📋 Linter JSON conscient de CoT",
+    "help.v082.cot.body":          "Les moteurs de décodage contraint (llguidance, Outlines, grammaires SGLang) émettent les propriétés JSON dans l'ordre que votre schema déclare. Si vous écrivez <code>{ answer, reasoning }</code> le modèle s'engage sur <code>answer</code> en premier et le CoT se réduit à une justification a posteriori. Collez n'importe quel schema (ou réponse exemple) — le linter classe chaque champ comme <em>raisonnement</em>, <em>réponse</em> ou <em>autre</em>, signale l'ordre, et émet une correction réordonnée à copier. <em>Cas d'usage</em> : 'Mon prompt CoT marche en texte brut mais dégrade en mode JSON' → lancez le linter, trouvez l'ordre inversé, corrigez.",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — chaque pain documenté mappé à un mode tafagent ou outil externe curé. Ne réinventez pas — trouvez.",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent comme intégrateur, pas silo. 30+ pains à travers 7 catégories (eval reliability · diagnostics · setup · training · retrieval · multimodal · observability), chacun mappé à (a) le mode tafagent qui le résout, s'il existe, et (b) les outils externes best-of-breed que la communauté utilise déjà (RAGAS, MTEB, HELM, MCP Schema Validator, llm-stats, llguidance, GlitchMiner, etc.). La barre de recherche matche pain, scénario, et nom d'outil. <em>Cas d'usage</em> : 'j'ai le problème X — tafagent le résout-il, et sinon, qui ?'",
     "help.v08.saturation.title":   "📈 Benchmark 饱和度检测器",
     "help.v08.saturation.body":    "MMLU 已饱和（top 88-94%），AIME 2025 上线几个月就饱和，HumanEval 接近饱和。选任何 benchmark，工具返回 top-3 frontier 分数、spread、平均，以及判定 — saturated / near-saturated / discriminative — 加上推荐替代品（例如 MMLU → MMLU-Pro / GPQA / HLE）。可达时从 DemandSphere AI Frontier Tracker（CC BY-NC 4.0）实时 fetch；不可达时使用 2026-05-05 的 baked 快照。<em>用例</em>：在引用\"92% on MMLU\"或设计 eval 之前，检查 benchmark 是否仍能区分任何东西。",
     "inv.v08.saturation":          "<strong>📈 Saturation</strong> — 你的 benchmark 还有用吗，还是所有 frontier 都在顶部并列？",
+    // v0.8.2 — anti-bullshit pack #8: JSON CoT-aware Linter
+    "modes.cot":                   "📋 JSON CoT",
+    "mode_desc.cot":               "对 JSON Schema（或示例响应对象）进行 linting，查找『答案在推理之前』的反模式。约束解码引擎按 schema 顺序输出字段——如果 `answer` 在 `reasoning` 之前，CoT 就被破坏了。",
+    "cot.title":                   "📋 JSON CoT 感知 Linter",
+    "cot.tip":                     "约束解码引擎（llguidance、Outlines、SGLang 语法）按 schema 顺序输出 JSON 属性。如果 schema 把 `answer` 放在 `reasoning` 之前，模型会先承诺最终答案，然后才写理由来证明它——彻底破坏 Chain-of-Thought。粘贴 JSON Schema（或示例对象），linter 会标记顺序问题。",
+    "cot.desc":                    "<strong>推理永远先于答案。</strong> 粘贴 JSON Schema 或示例响应对象——linter 报告推理字段是否在答案字段之前，并提出修复建议。",
+    "cot.input.placeholder":       "{ \"type\": \"object\", \"properties\": { \"answer\": {\"type\": \"string\"}, \"reasoning\": {\"type\": \"string\"} } }",
+    "cot.lint_btn":                "🔍 Lint",
+    "cot.example_good_btn":        "↳ 示例：正确顺序",
+    "cot.example_bad_btn":         "↳ 示例：反模式",
+    "cot.status.done":             "✅ {verdict}",
+    "cot.col.field":               "字段",
+    "cot.col.type":                "角色",
+    "cot.field.reasoning":         "推理",
+    "cot.field.answer":            "答案",
+    "cot.field.other":             "—",
+    "cot.field_count":             "{n} 个字段",
+    "cot.verdict.good_order":          "✅ 顺序正确——推理在答案之前",
+    "cot.verdict.anti_pattern":        "❌ 反模式——答案在推理之前",
+    "cot.verdict.missing_reasoning":   "⚠ 缺少推理字段",
+    "cot.verdict.missing_answer":      "ℹ 未检测到答案类字段",
+    "cot.verdict.no_cot_fields":       "ℹ 未检测到推理/答案字段",
+    "cot.verdict.invalid_json":        "❌ JSON 无效",
+    "cot.verdict.non_object":          "ℹ 顶层值不是对象",
+    "cot.verdict.empty_fields":        "ℹ 没有可分析的字段",
+    "cot.explain.good_order":          "约束解码会先输出推理，所以模型可以在承诺之前思考。Chain-of-Thought 保持诚实。",
+    "cot.explain.anti_pattern":        "模型被迫先输出答案字段；之后的任何推理只能为已承诺的内容辩护。重新排序，使推理类字段在答案类字段之前。",
+    "cot.explain.missing_reasoning":   "存在答案字段但没有推理字段。如果你想要 CoT，在答案<em>之前</em>添加 `reasoning`（或 `chain_of_thought`、`analysis`…）字段。",
+    "cot.explain.missing_answer":      "存在推理字段但没有明显的答案字段。确保 schema 实际上要求模型承诺一个最终值。",
+    "cot.explain.no_cot_fields":       "对象有字段但都不像推理或答案（按名称）。Linter 保守——如果 schema 是有意的，可以忽略。否则添加显式的推理/答案字段。",
+    "cot.hint.non_object":             "顶层值必须是 JSON 对象（`{ … }`）或带 `properties` 的 JSON Schema。",
+    "cot.hint.empty_fields":           "未检测到字段。粘贴 JSON Schema、示例响应，或点击 textarea 下方的示例按钮。",
+    "cot.suggested_fix.title":         "✓ 建议修复",
+    "cot.suggested_fix.desc":          "属性已重新排序——推理字段优先，然后是任何上下文字段，最后是答案字段。`required[]`（如果存在）也镜像同步。",
+    "cot.suggested_fix.copy":          "📋 复制",
+    "cot.suggested_fix.copied":        "✓ 已复制",
+    "cot.attribution":              "参考：",
+    "inv.v082.cot":                "<strong>📋 JSON CoT</strong> — 对 structured outputs schema 进行 linting，查找悄悄破坏 Chain-of-Thought 的『答案在推理之前』反模式。",
+    "help.v082.cot.title":         "📋 JSON CoT 感知 Linter",
+    "help.v082.cot.body":          "约束解码引擎（llguidance、Outlines、SGLang 语法）按 schema 声明的顺序输出 JSON 属性。如果你写 <code>{ answer, reasoning }</code>，模型先承诺 <code>answer</code>，CoT 就退化为事后辩护。粘贴任意 schema（或示例响应）——linter 把每个字段分类为<em>推理</em>、<em>答案</em>或<em>其他</em>，标记顺序，并输出可复制回去的重排修复。<em>用例</em>：『我的 CoT 提示在纯文本中正常但在 JSON 模式下退化』→ 运行 linter，找到颠倒的顺序，修复。",
     "inv.v081.hub":                "<strong>🧭 Solutions Hub</strong> — 每个文档化的问题都映射到一个 tafagent 模式或精选外部工具。别重复发明 — 去找。",
     "help.v081.hub.title":         "🧭 Solutions Hub",
     "help.v081.hub.body":          "tafagent 作为集成者而非孤岛。30+ 问题跨 7 类别（评估可靠性 · 诊断 · 设置 · 训练 · 检索 · 多模态 · 可观测性），每个映射到（a）解决它的 tafagent 模式（若存在），以及（b）社区已信任的最佳外部工具（RAGAS、MTEB、HELM、MCP Schema Validator、llm-stats、llguidance、GlitchMiner 等）。搜索框匹配 pain、场景和工具名称。<em>用例</em>：'我有问题 X — tafagent 解决它吗，如果不，谁解决？'",

js/json_cot_linter.js ADDED Viewed

	@@ -0,0 +1,203 @@

+// JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
+//
+// Pain (Solutions Hub `structured_outputs`): JSON schema engines fail
+// silently and CoT models commit to the answer before reasoning when
+// the schema places `answer` before `reasoning` — constrained decoding
+// emits keys in property order, so the model has to commit a final
+// answer first and only then writes the rationale to justify it,
+// defeating Chain-of-Thought entirely.
+//
+// Source citations:
+//   - https://collinwilkins.com/articles/structured-output (field
+//     ordering anti-pattern explained)
+//   - JSONSchemaBench (10K real schemas) — most are not CoT-aware
+//   - llguidance / Outlines / SGLang grammars — all respect property order
+//
+// Pure logic — no human strings. Returns codes+params; main.js does
+// the i18n lookup.
+// Heuristic field classifiers. Tested against real schemas + examples
+// in the smoke harness; conservative on `other` to avoid mislabeling
+// ambiguous fields (e.g. a `score` could be either reasoning-side or
+// answer-side, but lexically it patterns as answer-side and the
+// false-anti-pattern cost is only "review the schema", which is fine).
+const REASONING_PATTERNS = [
+  /reason/i,
+  /think/i,
+  /thought/i,
+  /\bcot\b/i,
+  /chain.of.thought/i,
+  /analysis/i,
+  /\bexplanation\b/i,
+  /rationale/i,
+  /step.by.step/i,
+  /scratchpad/i,
+  /justif/i,
+  /deliberat/i,
+  /\bplan\b/i,
+  /\bwhy\b/i,
+];
+const ANSWER_PATTERNS = [
+  /^answer$/i,
+  /^result$/i,
+  /^output$/i,
+  /^response$/i,
+  /^final/i,
+  /^verdict$/i,
+  /^decision$/i,
+  /^prediction$/i,
+  /^conclusion$/i,
+  /^value$/i,
+  /^score$/i,
+  /^classif/i,
+  /^label$/i,
+  /^choice$/i,
+  /^selected/i,
+];
+export function classifyFieldName(name) {
+  if (typeof name !== "string" || !name) return "other";
+  for (const pat of REASONING_PATTERNS) {
+    if (pat.test(name)) return "reasoning";
+  }
+  for (const pat of ANSWER_PATTERNS) {
+    if (pat.test(name)) return "answer";
+  }
+  return "other";
+}
+// Decide whether `parsed` is a JSON Schema (has `properties` / `$schema`
+// / `type: object`) or a plain example object. Both have ordered keys
+// in modern JS (ES2015+ insertion-order preservation for non-integer
+// string keys), and constrained decoders honor that order, so the
+// detection works on either form.
+function extractFieldOrder(parsed) {
+  if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) {
+    return { kind: "non_object", fields: [] };
+  }
+  // Schema form
+  if (parsed.properties && typeof parsed.properties === "object") {
+    return { kind: "schema", fields: Object.keys(parsed.properties) };
+  }
+  // Example object form
+  return { kind: "example", fields: Object.keys(parsed) };
+}
+function buildFieldAnnotations(fields) {
+  return fields.map((name, idx) => ({
+    name,
+    idx,
+    type: classifyFieldName(name),
+  }));
+}
+function suggestReorder(annotations) {
+  // Strategy: keep relative order within each type bucket, but emit
+  // reasoning fields first, then `other`, then answer fields. That
+  // way CoT runs first, the model can reference any context fields,
+  // and the answer comes last (constrained decoding commits the
+  // answer after the rationale).
+  const reasoning = annotations.filter(a => a.type === "reasoning").map(a => a.name);
+  const other     = annotations.filter(a => a.type === "other").map(a => a.name);
+  const answer    = annotations.filter(a => a.type === "answer").map(a => a.name);
+  return [...reasoning, ...other, ...answer];
+}
+// Public entry point. `text` is the user-pasted JSON Schema or example.
+// Returns { code, params } where `code` is one of:
+//   - invalid_json
+//   - non_object
+//   - empty_fields
+//   - good_order        (reasoning before answer — CoT honored)
+//   - anti_pattern      (answer before reasoning — model commits early)
+//   - missing_reasoning (answer-like fields present, no reasoning)
+//   - missing_answer    (reasoning fields present, no answer-like field)
+//   - no_cot_fields     (object has fields but none look reasoning/answer)
+export function lintJsonCot(text) {
+  if (typeof text !== "string" || !text.trim()) {
+    return { code: "empty_fields", params: { reason: "empty_input" } };
+  }
+  let parsed;
+  try {
+    parsed = JSON.parse(text);
+  } catch (e) {
+    return {
+      code: "invalid_json",
+      params: { error: String(e && e.message || e).slice(0, 200) },
+    };
+  }
+  const { kind, fields } = extractFieldOrder(parsed);
+  if (kind === "non_object") {
+    return { code: "non_object", params: { kind: Array.isArray(parsed) ? "array" : typeof parsed } };
+  }
+  if (fields.length === 0) {
+    return { code: "empty_fields", params: { kind } };
+  }
+  const annotations = buildFieldAnnotations(fields);
+  const reasoningIdx = annotations.findIndex(a => a.type === "reasoning");
+  const answerIdx    = annotations.findIndex(a => a.type === "answer");
+  const hasReasoning = reasoningIdx !== -1;
+  const hasAnswer    = answerIdx !== -1;
+  const baseParams = {
+    kind,
+    fields: annotations,
+    field_count: annotations.length,
+    reasoning_idx: hasReasoning ? reasoningIdx : null,
+    answer_idx: hasAnswer ? answerIdx : null,
+    suggested_order: suggestReorder(annotations),
+  };
+  if (!hasReasoning && !hasAnswer) {
+    return { code: "no_cot_fields", params: baseParams };
+  }
+  if (hasReasoning && !hasAnswer) {
+    return { code: "missing_answer", params: baseParams };
+  }
+  if (!hasReasoning && hasAnswer) {
+    return { code: "missing_reasoning", params: baseParams };
+  }
+  // Both present — order is decisive.
+  if (reasoningIdx < answerIdx) {
+    return { code: "good_order", params: baseParams };
+  }
+  return { code: "anti_pattern", params: baseParams };
+}
+// Build a properties-reordered JSON string preserving the original
+// shape (schema vs example). Used by the UI to show "suggested fix".
+export function reorderJsonText(text, suggestedOrder) {
+  let parsed;
+  try { parsed = JSON.parse(text); }
+  catch { return null; }
+  if (parsed === null || typeof parsed !== "object" || Array.isArray(parsed)) return null;
+  // Reorder properties within a plain object preserving values.
+  const reorderObj = (obj, order) => {
+    const out = {};
+    // First emit suggested keys that exist on the object.
+    for (const k of order) {
+      if (Object.prototype.hasOwnProperty.call(obj, k)) out[k] = obj[k];
+    }
+    // Then any keys not in the suggested order (defensive: keeps unknowns).
+    for (const k of Object.keys(obj)) {
+      if (!Object.prototype.hasOwnProperty.call(out, k)) out[k] = obj[k];
+    }
+    return out;
+  };
+  if (parsed.properties && typeof parsed.properties === "object") {
+    parsed.properties = reorderObj(parsed.properties, suggestedOrder);
+    // If `required` array exists, mirror suggested order so generators
+    // that emit fields in `required[]` order also benefit. Keep only
+    // the keys originally present in `required`.
+    if (Array.isArray(parsed.required)) {
+      const wasRequired = new Set(parsed.required);
+      parsed.required = suggestedOrder.filter(k => wasRequired.has(k));
+    }
+    return JSON.stringify(parsed, null, 2);
+  }
+  return JSON.stringify(reorderObj(parsed, suggestedOrder), null, 2);
+}

js/main.js CHANGED Viewed

@@ -27,6 +27,7 @@ import {
   loadHub, listCategories, listEntries, searchEntries,
   hubStats, getCategoryMeta,
 } from "./solutions_hub.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -216,6 +217,7 @@ document.addEventListener("click", (e) => {
       template: "template-section", arena: "arena-section", contam: "contam-section",
       quant: "quant-section", drift: "drift-section", niah: "niah-section",
       saturation: "saturation-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
@@ -241,7 +243,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
-     "saturation-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
@@ -253,6 +255,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
       template: "template-section", arena: "arena-section", contam: "contam-section",
       quant: "quant-section", drift: "drift-section", niah: "niah-section",
       saturation: "saturation-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
@@ -260,6 +263,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
     $("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
     if (mode === "phase") initPhaseDiagram();
     if (mode === "saturation") initSaturation();
     if (mode === "hub") initHub();
   });
 });
@@ -3384,6 +3388,173 @@ $("hub-clear-btn")?.addEventListener("click", () => {
   renderHubAll();
 });
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

   loadHub, listCategories, listEntries, searchEntries,
   hubStats, getCategoryMeta,
 } from "./solutions_hub.js";
+import { lintJsonCot, reorderJsonText, classifyFieldName } from "./json_cot_linter.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
       template: "template-section", arena: "arena-section", contam: "contam-section",
       quant: "quant-section", drift: "drift-section", niah: "niah-section",
       saturation: "saturation-section",
+      cot: "cot-section",
       hub: "hub-section",
     }[targetMode];
     if (sectionId) {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
+     "saturation-section", "cot-section", "hub-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
       template: "template-section", arena: "arena-section", contam: "contam-section",
       quant: "quant-section", drift: "drift-section", niah: "niah-section",
       saturation: "saturation-section",
+      cot: "cot-section",
       hub: "hub-section",
     };
     const sectionId = sectionMap[mode];
     $("mode-desc").textContent = t(`mode_desc.${mode}`) || "";
     if (mode === "phase") initPhaseDiagram();
     if (mode === "saturation") initSaturation();
+    if (mode === "cot") initCot();
     if (mode === "hub") initHub();
   });
 });
   renderHubAll();
 });
+// ════════════════════════════════════════════════════════════════════
+// 📋 JSON CoT-aware Linter (v0.8.2 anti-bullshit pack #8)
+// ════════════════════════════════════════════════════════════════════
+const COT_FIELD_TYPE_BADGE = {
+  reasoning: "🧠",
+  answer: "🎯",
+  other: "·",
+};
+const COT_VERDICT_BADGE_BG = {
+  good_order: "#3fb950",          // green
+  anti_pattern: "#f85149",        // red
+  missing_reasoning: "#d29922",   // amber
+  missing_answer: "#d29922",      // amber
+  no_cot_fields: "#8b949e",       // gray
+  non_object: "#8b949e",
+  empty_fields: "#8b949e",
+  invalid_json: "#f85149",        // red
+};
+let __cotInited = false;
+function initCot() {
+  if (__cotInited) return;
+  __cotInited = true;
+  // No-op (no async data); placeholder kept for symmetry with other modes.
+}
+function renderCotResult(result, originalText) {
+  const verdict = t(`cot.verdict.${result.code}`) || result.code;
+  const verdictBg = COT_VERDICT_BADGE_BG[result.code] || "#8b949e";
+  const verdictBadge = `<span class="badge" style="background:${verdictBg};">${verdict}</span>`;
+  // Failure cases short-circuit: just show the verdict + reason.
+  if (result.code === "invalid_json") {
+    const reason = result.params?.error || "";
+    return `<div class="arena-result">
+      <p style="font-size:1.1em;">${verdictBadge}</p>
+      <pre style="background:#21262d;padding:0.75em;border-radius:4px;color:#f0883e;">${escapeHtml(reason)}</pre>
+    </div>`;
+  }
+  if (result.code === "empty_fields" || result.code === "non_object") {
+    return `<div class="arena-result">
+      <p style="font-size:1.1em;">${verdictBadge}</p>
+      <p class="recipe-desc">${t(`cot.hint.${result.code}`) || ""}</p>
+    </div>`;
+  }
+  const fields = result.params?.fields || [];
+  const fieldRows = fields.map(f => {
+    const icon = COT_FIELD_TYPE_BADGE[f.type] || "·";
+    const typeLabel = t(`cot.field.${f.type}`) || f.type;
+    const color = f.type === "reasoning" ? "#3fb950"
+                : f.type === "answer"    ? "#f0883e"
+                : "#8b949e";
+    return `<tr>
+      <td style="text-align:right;color:#8b949e;">${f.idx}</td>
+      <td><code>${escapeHtml(f.name)}</code></td>
+      <td><span style="color:${color};">${icon} ${typeLabel}</span></td>
+    </tr>`;
+  }).join("");
+  const fieldTable = `
+    <table class="lean-table" style="margin-top:0.5em;">
+      <thead><tr>
+        <th>#</th>
+        <th data-i18n="cot.col.field">Field</th>
+        <th data-i18n="cot.col.type">Type</th>
+      </tr></thead>
+      <tbody>${fieldRows}</tbody>
+    </table>
+  `;
+  // Suggested-fix block — only when there's a meaningful reorder.
+  let fixBlock = "";
+  if (result.code === "anti_pattern") {
+    const suggested = result.params?.suggested_order || [];
+    const fixed = reorderJsonText(originalText, suggested);
+    if (fixed) {
+      fixBlock = `
+        <details open style="margin-top:1em;">
+          <summary style="cursor:pointer;color:#3fb950;">
+            <strong>${t("cot.suggested_fix.title") || "✓ Suggested fix"}</strong>
+          </summary>
+          <p class="recipe-desc">${t("cot.suggested_fix.desc") || ""}</p>
+          <pre style="background:#0d1117;padding:0.75em;border-radius:4px;overflow-x:auto;"><code>${escapeHtml(fixed)}</code></pre>
+          <button type="button" class="secondary" onclick="navigator.clipboard.writeText(this.previousElementSibling.textContent).then(()=>{this.textContent='${t("cot.suggested_fix.copied") || "✓ Copied"}';setTimeout(()=>{this.textContent='${t("cot.suggested_fix.copy") || "📋 Copy"}';},1500);})">${t("cot.suggested_fix.copy") || "📋 Copy"}</button>
+        </details>
+      `;
+    }
+  }
+  // Verdict explainer
+  const explainer = t(`cot.explain.${result.code}`) || "";
+  const explainerBlock = explainer
+    ? `<p class="recipe-desc">${explainer}</p>`
+    : "";
+  // Source attribution footer
+  const attribution = `
+    <p class="recipe-desc subtle" style="font-size:0.82em;margin-top:1em;">
+      ${t("cot.attribution") || ""}
+      <a href="https://collinwilkins.com/articles/structured-output" target="_blank" rel="noopener noreferrer">collinwilkins.com</a> ·
+      <a href="https://github.com/guidance-ai/jsonschemabench" target="_blank" rel="noopener noreferrer">JSONSchemaBench</a> ·
+      <a href="https://github.com/guidance-ai/llguidance" target="_blank" rel="noopener noreferrer">llguidance</a>
+    </p>
+  `;
+  return `<div class="arena-result">
+    <p style="font-size:1.1em;">${verdictBadge}
+      <span class="subtle" style="font-size:0.9em;">(${tFmt("cot.field_count", { n: result.params.field_count }) || `${result.params.field_count} fields`})</span>
+    </p>
+    ${explainerBlock}
+    ${fieldTable}
+    ${fixBlock}
+    ${attribution}
+  </div>`;
+}
+function runCotLint() {
+  const text = $("cot-input")?.value || "";
+  const result = lintJsonCot(text);
+  $("cot-output").innerHTML = renderCotResult(result, text);
+  $("cot-status").textContent = tFmt("cot.status.done", {
+    verdict: t(`cot.verdict.${result.code}`) || result.code,
+  });
+}
+const COT_EXAMPLE_GOOD = JSON.stringify({
+  type: "object",
+  properties: {
+    reasoning: {
+      type: "string",
+      description: "Step-by-step rationale before committing to an answer.",
+    },
+    answer: {
+      type: "string",
+      description: "Final answer, derived from the reasoning above.",
+    },
+  },
+  required: ["reasoning", "answer"],
+}, null, 2);
+const COT_EXAMPLE_BAD = JSON.stringify({
+  type: "object",
+  properties: {
+    final_answer: {
+      type: "string",
+      description: "The model's final answer.",
+    },
+    chain_of_thought: {
+      type: "string",
+      description: "Justification for the answer above.",
+    },
+  },
+  required: ["final_answer", "chain_of_thought"],
+}, null, 2);
+$("cot-lint-btn")?.addEventListener("click", runCotLint);
+$("cot-example-good-btn")?.addEventListener("click", () => {
+  $("cot-input").value = COT_EXAMPLE_GOOD;
+  runCotLint();
+});
+$("cot-example-bad-btn")?.addEventListener("click", () => {
+  $("cot-input").value = COT_EXAMPLE_BAD;
+  runCotLint();
+});
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════