Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 18 days ago

Commit

2eb69cb

1 Parent(s): e5ceb83

v0.9.1: GGUF Validity Bridge mode + binary header parser

Reads a .gguf file's metadata header straight from HF Hub via HTTP Range
(no multi-GB download) and answers what the dozen GGUF/VRAM calculators
skip: fits in VRAM AND still works?

- js/gguf_bridge.js: incremental Range-fetch GGUF v2/v3 parser (magic, KV
block, arrays skipped by byte-length so the tokenizer doesn't blow the
buffer). ggufToConfig maps GGUF metadata → HF-style config; quant scheme
from general.file_type with filename backstop. analyzeGguf cross-runs
γ_Padé / d_horizon (architecture) with the quant-regime γ-shift.
- "Compare all quants": one header parse → scores every quant in the repo
(geometry is shared; only the scheme differs), sorted best→worst as a
table. γ@L after quant is the comparison axis — it degrades monotonically;
d_horizon is NOT recomputed from a quant-shifted γ (that inverts the
formula). Verdict driven by γ@L + quant regime, not a hard d_horizon gate
(which understates reach for high-θ models like Qwen).
- index.html: tab + tile + #gguf-section + help v0.9.1 entry.
- main.js: import, wiring, cached header parse, single + comparison renders.
Context/horizon now formatted binary-K (32768→32K, not 33K); θ decimal M/K.
- i18n.js: full EN/ES/FR/ZH for all gguf.* keys.

Test (test_gguf.mjs): 25/25 — list/parse real GGUF (Qwen2.5 q4_k_m, 6MB
header), verdict, compare-all table, monotonic γ@L, verdict variety, 4
languages, error paths. 24 modes total, 0 JS errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show

index.html +43 -0
js/gguf_bridge.js +245 -0
js/i18n.js +152 -0
js/main.js +208 -3
test_gguf.mjs +107 -0

index.html CHANGED Viewed

@@ -246,6 +246,9 @@
       <p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
       <p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
       <h3 data-i18n="help.audit.title">The audit chain</h3>
       <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
       output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
@@ -408,6 +411,7 @@
             <button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
             <button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
             <button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
             <button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
           </div>
         </div>
@@ -503,6 +507,7 @@
         <button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
         <button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
         <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
@@ -1290,6 +1295,44 @@
       <div id="yarn-output" style="display:none; margin-top:1em;"></div>
     </section>
     <!-- Recipe selector (mode=recipe) -->
     <section id="recipe-section" style="display:none;">
       <h2 data-i18n="recipe.title">📋 Recipe</h2>

       <p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
       <p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
+      <p><strong data-i18n="help.v091.gguf.title">🧊 GGUF Validity Bridge</strong></p>
+      <p data-i18n="help.v091.gguf.body">The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict — HEALTHY / USABLE-WITH-CARE / DEGRADES. <em>Use case</em>: 'unsloth/Qwen3.5-9B-GGUF Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.</p>
       <h3 data-i18n="help.audit.title">The audit chain</h3>
       <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
       output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
             <button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
             <button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
             <button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
+            <button data-mode-link="gguf" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
             <button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
           </div>
         </div>
         <button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
         <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
         <button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
+        <button class="mode-btn" data-mode="gguf" role="tab" aria-selected="false" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
       </div>
       <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
         <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
       <div id="yarn-output" style="display:none; margin-top:1em;"></div>
     </section>
+    <!-- GGUF Validity Bridge (mode=gguf) -->
+    <section id="gguf-section" style="display:none;">
+      <h2><span data-i18n="gguf.title">🧊 GGUF Validity Bridge</span>
+        <span class="info"><span class="tooltip" data-i18n="gguf.tip">
+          <strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to
+          tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta,
+          context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via
+          HTTP Range — no multi-GB download — and answers the question they don't: does attention
+          quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?
+        </span></span>
+      </h2>
+      <p class="recipe-desc" data-i18n="gguf.desc">
+        Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a
+        TAF quality verdict: the model's effective attention horizon, plus how much the chosen
+        quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your
+        browser.
+      </p>
+      <div class="form-row">
+        <label for="gguf-repo" data-i18n="gguf.repo_label">GGUF repo id:</label>
+        <input type="text" id="gguf-repo" placeholder="Qwen/Qwen2.5-7B-Instruct-GGUF">
+        <button id="gguf-list-btn" class="secondary" data-i18n="gguf.list_btn">📂 List quant files</button>
+      </div>
+      <span id="gguf-status" class="subtle"></span>
+      <div class="form-row">
+        <label for="gguf-file" data-i18n="gguf.file_label">Quant file:</label>
+        <select id="gguf-file" disabled></select>
+      </div>
+      <div class="form-row">
+        <label for="gguf-target" data-i18n="gguf.target_label">Target context L (optional):</label>
+        <input type="number" id="gguf-target" placeholder="(defaults to trained context)" min="256">
+      </div>
+      <button id="gguf-analyze-btn" disabled data-i18n="gguf.analyze_btn">🧊 Analyze GGUF</button>
+      <button id="gguf-all-btn" class="secondary" disabled data-i18n="gguf.all_btn">📊 Compare all quants</button>
+      <div id="gguf-output" style="display:none; margin-top:1em;"></div>
+    </section>
     <!-- Recipe selector (mode=recipe) -->
     <section id="recipe-section" style="display:none;">
       <h2 data-i18n="recipe.title">📋 Recipe</h2>

js/gguf_bridge.js ADDED Viewed

	@@ -0,0 +1,245 @@

+// GGUF Validity Bridge (v0.9.1 anti-bullshit pack)
+//
+// The dozen GGUF/VRAM calculators on HF answer "does this quant fit in my GPU?".
+// None answer "does it fit AND still work?". This reads a .gguf file's metadata
+// header directly in the browser (HTTP Range — no full multi-GB download), pulls
+// rope_theta + context_length + quant scheme + head geometry, then runs TAF's
+// γ_Padé / d_horizon + the quant-regime γ-shift to emit a quality verdict:
+// "fits in VRAM but attention collapses past d_horizon, and Q4 worsens γ by …".
+//
+// Parser logic is pure; the network fetch is unavoidable I/O. main.js renders.
+import { gammaPade } from "./gamma_check.js";
+import { dHorizon } from "./yarn_planner.js";
+import { predictQuantShift } from "./quant_regime.js";
+// ── GGUF metadata value types (spec v2/v3) ──
+const GT = { U8:0, I8:1, U16:2, I16:3, U32:4, I32:5, F32:6, BOOL:7, STR:8, ARR:9, U64:10, I64:11, F64:12 };
+const FIXED_SIZE = { 0:1, 1:1, 2:2, 3:2, 4:4, 5:4, 6:4, 7:1, 10:8, 11:8, 12:8 };
+// general.file_type enum (llama_ftype) → human label + the quant_regime scheme id
+// we feed to predictQuantShift. Only the common ones; filename parsing backstops.
+const FTYPE = {
+  0:  ["F32",     null],
+  1:  ["F16",     null],
+  2:  ["Q4_0",    "gguf_q4_km"],
+  3:  ["Q4_1",    "gguf_q4_km"],
+  7:  ["Q8_0",    "gguf_q8_0"],
+  8:  ["Q5_0",    "gguf_q5_km"],
+  9:  ["Q5_1",    "gguf_q5_km"],
+  10: ["Q2_K",    "gguf_q2_k"],
+  11: ["Q3_K_S",  "gguf_q3_km"],
+  12: ["Q3_K_M",  "gguf_q3_km"],
+  13: ["Q3_K_L",  "gguf_q3_km"],
+  14: ["Q4_K_S",  "gguf_q4_km"],
+  15: ["Q4_K_M",  "gguf_q4_km"],
+  16: ["Q5_K_S",  "gguf_q5_km"],
+  17: ["Q5_K_M",  "gguf_q5_km"],
+  18: ["Q6_K",    "gguf_q8_0"],
+};
+// Filename → (label, scheme) backstop when general.file_type is absent/ambiguous.
+export function quantFromFilename(name) {
+  const n = (name || "").toUpperCase();
+  const pairs = [
+    ["Q2_K", "gguf_q2_k"], ["Q3_K", "gguf_q3_km"], ["Q4_K", "gguf_q4_km"],
+    ["Q5_K", "gguf_q5_km"], ["Q6_K", "gguf_q8_0"], ["Q8_0", "gguf_q8_0"],
+    ["Q4_0", "gguf_q4_km"], ["Q4_1", "gguf_q4_km"], ["Q5_0", "gguf_q5_km"],
+    ["Q5_1", "gguf_q5_km"], ["F16", null], ["BF16", null], ["F32", null],
+  ];
+  for (const [tag, scheme] of pairs) {
+    if (n.includes(tag)) return { label: tag.replace(/_$/, ""), scheme };
+  }
+  return { label: "?", scheme: null };
+}
+// List the .gguf files in a HF repo (so the user can pick a quant).
+export async function listGgufFiles(repo) {
+  const resp = await fetch(`https://huggingface.co/api/models/${encodeURIComponent(repo).replace(/%2F/g, "/")}`);
+  if (!resp.ok) throw new Error(`HTTP ${resp.status} — repo not found or private`);
+  const data = await resp.json();
+  const sib = Array.isArray(data.siblings) ? data.siblings : [];
+  return sib.map(s => s.rfilename).filter(f => /\.gguf$/i.test(f)).sort();
+}
+// Incremental Range-fetch reader. GGUF metadata sits at the file head; arch +
+// rope fields precede the big tokenizer arrays, so a few MB always suffices.
+class GgufReader {
+  constructor(url) {
+    this.url = url;
+    this.buf = new Uint8Array(0);
+    this.dv = new DataView(this.buf.buffer);
+    this.off = 0;
+    this.fetched = 0;
+    this.CHUNK = 1 << 20;       // 1 MB per range
+    this.MAX = 48 << 20;        // hard cap 48 MB
+    this.eof = false;
+  }
+  async ensure(n) {
+    while (this.off + n > this.buf.length && !this.eof && this.fetched < this.MAX) {
+      const start = this.fetched;
+      const end = Math.min(this.fetched + this.CHUNK, this.MAX) - 1;
+      const resp = await fetch(this.url, { headers: { Range: `bytes=${start}-${end}` } });
+      if (!resp.ok && resp.status !== 206 && resp.status !== 200) throw new Error(`HTTP ${resp.status}`);
+      const part = new Uint8Array(await resp.arrayBuffer());
+      if (part.length === 0) { this.eof = true; break; }
+      const merged = new Uint8Array(this.buf.length + part.length);
+      merged.set(this.buf); merged.set(part, this.buf.length);
+      this.buf = merged;
+      this.dv = new DataView(this.buf.buffer);
+      this.fetched += part.length;
+      if (part.length < this.CHUNK) this.eof = true; // server returned the tail
+    }
+    if (this.off + n > this.buf.length) throw new Error("gguf_metadata_too_large");
+  }
+  async u8()  { await this.ensure(1); return this.dv.getUint8(this.off++); }
+  async u16() { await this.ensure(2); const v = this.dv.getUint16(this.off, true); this.off += 2; return v; }
+  async i16() { await this.ensure(2); const v = this.dv.getInt16(this.off, true); this.off += 2; return v; }
+  async u32() { await this.ensure(4); const v = this.dv.getUint32(this.off, true); this.off += 4; return v; }
+  async i32() { await this.ensure(4); const v = this.dv.getInt32(this.off, true); this.off += 4; return v; }
+  async f32() { await this.ensure(4); const v = this.dv.getFloat32(this.off, true); this.off += 4; return v; }
+  async f64() { await this.ensure(8); const v = this.dv.getFloat64(this.off, true); this.off += 8; return v; }
+  // u64/i64 as Number — safe for counts/dims well under 2^53.
+  async u64() { await this.ensure(8); const lo = this.dv.getUint32(this.off, true); const hi = this.dv.getUint32(this.off + 4, true); this.off += 8; return hi * 4294967296 + lo; }
+  async i64() { return this.u64(); }
+  async skip(n) { await this.ensure(0); // ensure buffer exists
+    // skip may exceed current buffer; pull enough then advance offset
+    await this.ensure(Math.min(n, this.MAX)); this.off += n;
+    if (this.off > this.buf.length) { this.off = this.buf.length; throw new Error("gguf_metadata_too_large"); }
+  }
+  async str() {
+    const len = await this.u64();
+    await this.ensure(len);
+    const bytes = this.buf.subarray(this.off, this.off + len);
+    this.off += len;
+    return new TextDecoder("utf-8").decode(bytes);
+  }
+}
+async function readValue(r, type) {
+  switch (type) {
+    case GT.U8: return r.u8();
+    case GT.I8: { const v = await r.u8(); return v > 127 ? v - 256 : v; }
+    case GT.U16: return r.u16();
+    case GT.I16: return r.i16();
+    case GT.U32: return r.u32();
+    case GT.I32: return r.i32();
+    case GT.F32: return r.f32();
+    case GT.BOOL: return (await r.u8()) !== 0;
+    case GT.STR: return r.str();
+    case GT.U64: return r.u64();
+    case GT.I64: return r.i64();
+    case GT.F64: return r.f64();
+    case GT.ARR: {
+      const et = await r.u32();
+      const len = await r.u64();
+      if (FIXED_SIZE[et]) { await r.skip(len * FIXED_SIZE[et]); return { __array: len, elemType: et }; }
+      if (et === GT.STR) { for (let i = 0; i < len; i++) { const sl = await r.u64(); await r.skip(sl); } return { __array: len, elemType: et }; }
+      throw new Error("gguf_nested_array");
+    }
+    default: throw new Error(`gguf_unknown_type_${type}`);
+  }
+}
+// Parse the metadata KV block. Returns a flat { key: value } map (arrays are
+// returned as {__array,len} stubs — we never need their contents here).
+export async function fetchGgufMetadata(url) {
+  const r = new GgufReader(url);
+  const magic = (await r.u8()) | ((await r.u8()) << 8) | ((await r.u8()) << 16) | ((await r.u8()) << 24);
+  if (magic !== 0x46554747 /* 'GGUF' little-endian */) throw new Error("not_a_gguf_file");
+  const version = await r.u32();
+  const tensorCount = await r.u64();
+  const kvCount = await r.u64();
+  const kv = {};
+  for (let i = 0; i < kvCount; i++) {
+    const key = await r.str();
+    const type = await r.u32();
+    kv[key] = await readValue(r, type);
+  }
+  return { version, tensorCount, kvCount, kv, bytesRead: r.fetched };
+}
+// Map raw GGUF metadata → HF-style config (so quant_regime + TAF math can reuse it).
+export function ggufToConfig(meta) {
+  const kv = meta.kv || {};
+  const arch = kv["general.architecture"];
+  const g = (suffix, fallback = null) => (arch && kv[`${arch}.${suffix}`] !== undefined ? kv[`${arch}.${suffix}`] : fallback);
+  const n_attn = g("attention.head_count");
+  const n_kv = g("attention.head_count_kv", n_attn);
+  const hidden = g("embedding_length");
+  const keyLen = g("attention.key_length");
+  const headDim = (typeof keyLen === "number") ? keyLen
+                : (n_attn && hidden ? hidden / n_attn : null);
+  const ftypeEnum = kv["general.file_type"];
+  const ftype = (typeof ftypeEnum === "number" && FTYPE[ftypeEnum]) ? FTYPE[ftypeEnum] : null;
+  return {
+    architecture: arch || "?",
+    quant_label: ftype ? ftype[0] : null,
+    quant_scheme: ftype ? ftype[1] : null,
+    rope_theta: g("rope.freq_base", null),
+    context_length: g("context_length", null),
+    rope_scaling_type: g("rope.scaling.type", null),
+    rope_scaling_factor: g("rope.scaling.factor", null),
+    rope_orig_ctx: g("rope.scaling.original_context_length", null),
+    // HF-config aliases for predictQuantShift / inferNParams:
+    num_attention_heads: n_attn ?? null,
+    num_key_value_heads: n_kv ?? null,
+    hidden_size: hidden ?? null,
+    head_dim: headDim,
+    num_hidden_layers: g("block_count", null),
+    sliding_window: g("attention.sliding_window", null),
+    vocab_size: g("vocab_size", null),
+  };
+}
+// Bridge verdict: combine GGUF geometry + TAF horizon + quant γ-shift.
+//   cfg       : ggufToConfig output (may be edited by user / filename backstop)
+//   targetCtx : optional desired context L to check (else uses context_length)
+export function analyzeGguf(cfg, targetCtx) {
+  const theta = Number(cfg.rope_theta) || 10000;
+  const nCtx = Number(cfg.context_length) || null;
+  const L = Number(targetCtx) || nCtx;
+  // fp16 attention horizon — architectural, set by θ. SAME across every quant
+  // of the model (quantisation adds noise, it does not change θ). d_horizon is
+  // a function of the *natural* Padé γ, so it must be computed from the fp16 γ —
+  // never from a quant-shifted γ (that inverts the formula and is meaningless).
+  const gammaTrain = nCtx ? gammaPade(theta, nCtx) : null;
+  const dHoriz = gammaTrain != null ? dHorizon(theta, gammaTrain) : null;
+  // Quant γ-shift via the existing quant-regime model (architecture-aware).
+  const quant = cfg.quant_scheme ? predictQuantShift(cfg, cfg.quant_scheme) : null;
+  // γ at the target L: fp16, then after the quant shift. This is the quantity
+  // that degrades monotonically with worse quant — the correct comparison axis.
+  const gammaAtL = (theta && L) ? gammaPade(theta, L) : null;
+  const shift = quant ? quant.gamma_shift : 0;
+  const gammaQuant = (gammaAtL != null) ? gammaAtL - shift : null;
+  // Verdict is driven by γ@L after quant (the direct attention-quality signal
+  // at the target length) plus the quant-regime band. We deliberately do NOT
+  // gate on L ≤ d_horizon: the closed-form d_horizon understates the true reach
+  // for high-θ models (e.g. Qwen θ=1e6 keeps γ healthy far past its d_horizon),
+  // so γ@L is the honest measure. `reaches` is reported for context only.
+  const reaches = dHoriz != null && L != null && L <= dHoriz;
+  const collapsed = !Number.isFinite(gammaQuant) || gammaQuant <= 0.2;
+  const quantCliff = quant && quant.regime === "cliff";
+  let verdict;
+  if (nCtx == null || theta == null) verdict = "incomplete";
+  else if (collapsed || quantCliff) verdict = "degrades";
+  else if (gammaQuant >= 0.6 && (!quant || quant.regime === "safe" || quant.regime === "mild")) verdict = "healthy";
+  else verdict = "usable_with_care";
+  return {
+    theta, nCtx, L,
+    gammaTrain, dHoriz,          // fp16 architectural horizon (shared across quants)
+    gammaAtL, gammaQuant,        // attention at L: fp16 vs after-quant
+    reaches,                     // is L within the fp16 horizon?
+    quant,                       // {gamma_shift, regime, delta_ppl, ...} or null
+    quantLabel: cfg.quant_label,
+    arch: cfg.architecture,
+    verdict,
+  };
+}

js/i18n.js CHANGED Viewed

@@ -427,6 +427,44 @@ export const TRANSLATIONS = {
     "mode_desc.hub":               "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
     "modes.yarn":                  "🧵 YaRN Planner",
     "mode_desc.yarn":              "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
     "yarn.title":                  "🧵 YaRN / RoPE Context-Extension Planner",
     "yarn.tip":                    "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
     "yarn.desc":                   "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
@@ -1738,6 +1776,44 @@ export const TRANSLATIONS = {
     "mode_desc.hub":               "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
     "modes.yarn":                  "🧵 Planificador YaRN",
     "mode_desc.yarn":              "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
     "yarn.title":                  "🧵 Planificador de extensión de contexto YaRN / RoPE",
     "yarn.tip":                    "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
     "yarn.desc":                   "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
@@ -2903,6 +2979,44 @@ export const TRANSLATIONS = {
     "mode_desc.hub":               "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
     "modes.yarn":                  "🧵 Planificateur YaRN",
     "mode_desc.yarn":              "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
     "yarn.title":                  "🧵 Planificateur d'extension de contexte YaRN / RoPE",
     "yarn.tip":                    "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
     "yarn.desc":                   "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
@@ -4068,6 +4182,44 @@ export const TRANSLATIONS = {
     "mode_desc.hub":               "每个 LLM-eval 问题的地图 → tafagent 模式（若覆盖）+ 精选外部工具。找到方案而非重新发明。30+ 问题，7 类别。",
     "modes.yarn":                  "🧵 YaRN 规划器",
     "mode_desc.yarn":              "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决：在目标长度下注意力质量是否真的撑得住。",
     "yarn.title":                  "🧵 YaRN / RoPE 上下文扩展规划器",
     "yarn.tip":                    "<strong>配置 + 裁决，不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块，并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制，全在浏览器内运行。",
     "yarn.desc":                   "想让模型超出其训练上下文运行？输入模型（或其 θ + 训练上下文）和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段（transformers ≥4.43），外加 TAF 裁决：有效注意力视界能否到达 L，还是模型在 d_horizon 之外就开始幻觉？",

     "mode_desc.hub":               "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
     "modes.yarn":                  "🧵 YaRN Planner",
     "mode_desc.yarn":              "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
+    "modes.gguf":                  "🧊 GGUF Bridge",
+    "mode_desc.gguf":              "Read a GGUF file's metadata header (rope_theta, context_length, quant) in your browser and get a TAF quality verdict — the question the VRAM calculators skip: fits AND works?",
+    "gguf.title":                  "🧊 GGUF Validity Bridge",
+    "gguf.tip":                    "<strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta, context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via HTTP Range — no multi-GB download — and answers the question they don't: does attention quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?",
+    "gguf.desc":                   "Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a TAF quality verdict: the model's effective attention horizon, plus how much the chosen quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your browser.",
+    "gguf.repo_label":             "GGUF repo id:",
+    "gguf.list_btn":               "📂 List quant files",
+    "gguf.file_label":             "Quant file:",
+    "gguf.target_label":           "Target context L (optional):",
+    "gguf.analyze_btn":            "🧊 Analyze GGUF",
+    "gguf.all_btn":                "📊 Compare all quants",
+    "gguf.compare_title":          "All quants — quality comparison",
+    "gguf.col.verdict":            "Verdict",
+    "gguf.col.gamma_at_l":         "γ @ L (after quant)",
+    "gguf.need_repo":              "Enter a GGUF repo id like 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
+    "gguf.listing":                "Listing .gguf files from HF Hub…",
+    "gguf.no_files":               "No .gguf files found in that repo.",
+    "gguf.found":                  "quant files found",
+    "gguf.pick_hint":              "pick one and click Analyze.",
+    "gguf.reading":                "Reading GGUF header via HTTP Range…",
+    "gguf.read_ok":                "Header parsed",
+    "gguf.verdict.healthy":        "HEALTHY — effective horizon reaches L with good γ after quant",
+    "gguf.verdict.usable_with_care":"USABLE WITH CARE — reaches L but γ is modest after quant",
+    "gguf.verdict.degrades":       "DEGRADES — attention collapses before L (or quant pushes it there)",
+    "gguf.r.arch":                 "Architecture",
+    "gguf.r.ctx_train":            "Trained context",
+    "gguf.r.horizon_fp16":         "Attention horizon (fp16)",
+    "gguf.r.quant":                "Quant scheme",
+    "gguf.r.gamma_shift":          "γ-shift from quant",
+    "gguf.r.after_quant":          "(after quant)",
+    "gguf.r.eff_horizon":          "Effective horizon (quantised)",
+    "gguf.r.no_quant_shift":       "— full precision, no γ-shift",
+    "gguf.r.note":                 "Horizon from γ_Padé / d_horizon (architecture). Quant γ-shift + ΔPPL from the quant-regime model (calibrated to llama.cpp PPL + AWQ/GPTQ papers). Both are estimates — verify borderline cases with a real eval.",
+    "gguf.err.not_gguf":           "That file isn't a valid GGUF (bad magic).",
+    "gguf.err.too_large":          "Metadata header exceeds the fetch cap — unusually large tokenizer. Try another quant.",
+    "gguf.err.incomplete":         "GGUF metadata is missing rope_theta or context_length — can't compute the horizon.",
+    "help.v091.gguf.title":        "🧊 GGUF Validity Bridge",
+    "help.v091.gguf.body":         "The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict. <em>Use case</em>: 'Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.",
     "yarn.title":                  "🧵 YaRN / RoPE Context-Extension Planner",
     "yarn.tip":                    "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
     "yarn.desc":                   "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
     "mode_desc.hub":               "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
     "modes.yarn":                  "🧵 Planificador YaRN",
     "mode_desc.yarn":              "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
+    "modes.gguf":                  "🧊 Puente GGUF",
+    "mode_desc.gguf":              "Lee la cabecera de metadata de un archivo GGUF (rope_theta, context_length, quant) en tu navegador y obtén un veredicto de calidad TAF — la pregunta que los calculadores de VRAM ignoran: ¿cabe Y funciona?",
+    "gguf.title":                  "🧊 Puente de validez GGUF",
+    "gguf.tip":                    "<strong>Caber en VRAM ≠ funcionar</strong>. Los calculadores GGUF/VRAM leen la metadata de un modelo para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la MISMA metadata (rope_theta, context_length, esquema de quant, geometría de cabezas) directamente de la cabecera <code>.gguf</code> vía HTTP Range — sin descargar GB — y responde lo que ellos no: ¿aguanta de verdad la calidad de atención, y cuánto la erosiona el quant (γ-shift, ΔPPL)?",
+    "gguf.desc":                   "Pega un repo GGUF (p.ej. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), elige un archivo de quant, y obtén un veredicto de calidad TAF: el horizonte de atención efectivo del modelo, más cuánto desplaza γ la cuantización elegida para <em>esta arquitectura concreta</em>. Solo lee la cabecera del archivo en tu navegador.",
+    "gguf.repo_label":             "ID del repo GGUF:",
+    "gguf.list_btn":               "📂 Listar archivos quant",
+    "gguf.file_label":             "Archivo quant:",
+    "gguf.target_label":           "Contexto objetivo L (opcional):",
+    "gguf.analyze_btn":            "🧊 Analizar GGUF",
+    "gguf.all_btn":                "📊 Comparar todos los quants",
+    "gguf.compare_title":          "Todos los quants — comparación de calidad",
+    "gguf.col.verdict":            "Veredicto",
+    "gguf.col.gamma_at_l":         "γ @ L (tras quant)",
+    "gguf.need_repo":              "Introduce un id de repo GGUF como 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
+    "gguf.listing":                "Listando archivos .gguf de HF Hub…",
+    "gguf.no_files":               "No se encontraron archivos .gguf en ese repo.",
+    "gguf.found":                  "archivos quant encontrados",
+    "gguf.pick_hint":              "elige uno y pulsa Analizar.",
+    "gguf.reading":                "Leyendo cabecera GGUF vía HTTP Range…",
+    "gguf.read_ok":                "Cabecera analizada",
+    "gguf.verdict.healthy":        "SANO — el horizonte efectivo alcanza L con buen γ tras quant",
+    "gguf.verdict.usable_with_care":"USABLE CON CUIDADO — alcanza L pero γ es modesto tras quant",
+    "gguf.verdict.degrades":       "DEGRADA — la atención colapsa antes de L (o el quant la empuja ahí)",
+    "gguf.r.arch":                 "Arquitectura",
+    "gguf.r.ctx_train":            "Contexto entrenado",
+    "gguf.r.horizon_fp16":         "Horizonte de atención (fp16)",
+    "gguf.r.quant":                "Esquema de quant",
+    "gguf.r.gamma_shift":          "γ-shift por quant",
+    "gguf.r.after_quant":          "(tras quant)",
+    "gguf.r.eff_horizon":          "Horizonte efectivo (cuantizado)",
+    "gguf.r.no_quant_shift":       "— precisión completa, sin γ-shift",
+    "gguf.r.note":                 "Horizonte desde γ_Padé / d_horizon (arquitectura). γ-shift de quant + ΔPPL desde el modelo quant-regime (calibrado a PPL de llama.cpp + papers AWQ/GPTQ). Ambos son estimaciones — verifica los casos límite con un eval real.",
+    "gguf.err.not_gguf":           "Ese archivo no es un GGUF válido (magic incorrecto).",
+    "gguf.err.too_large":          "La cabecera de metadata supera el límite de descarga — tokenizer inusualmente grande. Prueba otro quant.",
+    "gguf.err.incomplete":         "A la metadata GGUF le falta rope_theta o context_length — no se puede calcular el horizonte.",
+    "help.v091.gguf.title":        "🧊 Puente de validez GGUF",
+    "help.v091.gguf.body":         "La docena de calculadores GGUF/VRAM (NyxKrage, oobabooga, …) leen una cabecera <code>.gguf</code> para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la misma cabecera — vía HTTP Range, sin descargar GB — y responde lo que ellos saltan: <em>¿cabe Y además funciona?</em> Pega un repo GGUF, elige un archivo de quant; el puente extrae <code>rope_theta</code>, <code>context_length</code>, el esquema de quant (de <code>general.file_type</code> o del nombre del archivo), y la geometría de cabezas, luego corre γ_Padé / d_horizon de TAF más el γ-shift de quant consciente de arquitectura. Salida: horizonte de atención efectivo en el contexto entrenado, cuánto erosiona γ el quant (y ΔPPL) para <em>este</em> modelo, y un veredicto. <em>Caso de uso</em>: 'Q4_K_M cabe en 8GB — ¿pero se vuelve tonto pasado 30K?' → ve el horizonte y la penalización γ de Q4 antes de descargar 6 GB.",
     "yarn.title":                  "🧵 Planificador de extensión de contexto YaRN / RoPE",
     "yarn.tip":                    "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
     "yarn.desc":                   "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
     "mode_desc.hub":               "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
     "modes.yarn":                  "🧵 Planificateur YaRN",
     "mode_desc.yarn":              "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
+    "modes.gguf":                  "🧊 Pont GGUF",
+    "mode_desc.gguf":              "Lit l'en-tête de métadonnées d'un fichier GGUF (rope_theta, context_length, quant) dans votre navigateur et donne un verdict de qualité TAF — la question que les calculateurs de VRAM ignorent : tient ET fonctionne ?",
+    "gguf.title":                  "🧊 Pont de validité GGUF",
+    "gguf.tip":                    "<strong>Tenir dans la VRAM ≠ fonctionner</strong>. Les calculateurs GGUF/VRAM lisent les métadonnées d'un modèle pour dire si un quant <em>tient dans le GPU</em>. Ceci lit les MÊMES métadonnées (rope_theta, context_length, schéma de quant, géométrie des têtes) directement depuis l'en-tête <code>.gguf</code> via HTTP Range — sans télécharger des Go — et répond à ce qu'ils n'abordent pas : la qualité d'attention tient-elle vraiment, et de combien le quant l'érode-t-il (γ-shift, ΔPPL) ?",
+    "gguf.desc":                   "Collez un dépôt GGUF (ex. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), choisissez un fichier de quant, et obtenez un verdict de qualité TAF : l'horizon d'attention effectif du modèle, plus de combien la quantification choisie décale γ pour <em>cette architecture précise</em>. Ne lit que l'en-tête du fichier dans votre navigateur.",
+    "gguf.repo_label":             "ID du dépôt GGUF :",
+    "gguf.list_btn":               "📂 Lister les fichiers quant",
+    "gguf.file_label":             "Fichier quant :",
+    "gguf.target_label":           "Contexte cible L (optionnel) :",
+    "gguf.analyze_btn":            "🧊 Analyser le GGUF",
+    "gguf.all_btn":                "📊 Comparer tous les quants",
+    "gguf.compare_title":          "Tous les quants — comparaison de qualité",
+    "gguf.col.verdict":            "Verdict",
+    "gguf.col.gamma_at_l":         "γ @ L (après quant)",
+    "gguf.need_repo":              "Saisissez un id de dépôt GGUF comme 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
+    "gguf.listing":                "Listage des fichiers .gguf depuis HF Hub…",
+    "gguf.no_files":               "Aucun fichier .gguf trouvé dans ce dépôt.",
+    "gguf.found":                  "fichiers quant trouvés",
+    "gguf.pick_hint":              "choisissez-en un et cliquez Analyser.",
+    "gguf.reading":                "Lecture de l'en-tête GGUF via HTTP Range…",
+    "gguf.read_ok":                "En-tête analysé",
+    "gguf.verdict.healthy":        "SAIN — l'horizon effectif atteint L avec un bon γ après quant",
+    "gguf.verdict.usable_with_care":"UTILISABLE AVEC PRUDENCE — atteint L mais γ est modeste après quant",
+    "gguf.verdict.degrades":       "DÉGRADE — l'attention s'effondre avant L (ou le quant l'y pousse)",
+    "gguf.r.arch":                 "Architecture",
+    "gguf.r.ctx_train":            "Contexte d'entraînement",
+    "gguf.r.horizon_fp16":         "Horizon d'attention (fp16)",
+    "gguf.r.quant":                "Schéma de quant",
+    "gguf.r.gamma_shift":          "γ-shift dû au quant",
+    "gguf.r.after_quant":          "(après quant)",
+    "gguf.r.eff_horizon":          "Horizon effectif (quantifié)",
+    "gguf.r.no_quant_shift":       "— pleine précision, pas de γ-shift",
+    "gguf.r.note":                 "Horizon depuis γ_Padé / d_horizon (architecture). γ-shift de quant + ΔPPL depuis le modèle quant-regime (calibré sur la PPL de llama.cpp + papiers AWQ/GPTQ). Les deux sont des estimations — vérifiez les cas limites avec un éval réel.",
+    "gguf.err.not_gguf":           "Ce fichier n'est pas un GGUF valide (mauvais magic).",
+    "gguf.err.too_large":          "L'en-tête de métadonnées dépasse la limite de téléchargement — tokenizer inhabituellement grand. Essayez un autre quant.",
+    "gguf.err.incomplete":         "Il manque rope_theta ou context_length dans les métadonnées GGUF — impossible de calculer l'horizon.",
+    "help.v091.gguf.title":        "🧊 Pont de validité GGUF",
+    "help.v091.gguf.body":         "La douzaine de calculateurs GGUF/VRAM (NyxKrage, oobabooga, …) lisent un en-tête <code>.gguf</code> pour dire si un quant <em>tient dans le GPU</em>. Ceci lit le même en-tête — via HTTP Range, sans télécharger des Go — et répond à ce qu'ils sautent : <em>tient-il ET fonctionne-t-il encore ?</em> Collez un dépôt GGUF, choisissez un fichier de quant ; le pont extrait <code>rope_theta</code>, <code>context_length</code>, le schéma de quant (depuis <code>general.file_type</code> ou le nom de fichier) et la géométrie des têtes, puis exécute γ_Padé / d_horizon de TAF plus le γ-shift de quant conscient de l'architecture. Sortie : horizon d'attention effectif au contexte d'entraînement, de combien le quant érode γ (et ΔPPL) pour <em>ce</em> modèle, et un verdict. <em>Cas d'usage</em> : 'Q4_K_M tient dans 8 Go — mais est-il abruti au-delà de 30K ?' → voyez l'horizon et la pénalité γ de Q4 avant de télécharger 6 Go.",
     "yarn.title":                  "🧵 Planificateur d'extension de contexte YaRN / RoPE",
     "yarn.tip":                    "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
     "yarn.desc":                   "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
     "mode_desc.hub":               "每个 LLM-eval 问题的地图 → tafagent 模式（若覆盖）+ 精选外部工具。找到方案而非重新发明。30+ 问题，7 类别。",
     "modes.yarn":                  "🧵 YaRN 规划器",
     "mode_desc.yarn":              "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决：在目标长度下注意力质量是否真的撑得住。",
+    "modes.gguf":                  "🧊 GGUF 桥",
+    "mode_desc.gguf":              "在浏览器内读取 GGUF 文件的元数据头（rope_theta、context_length、量化），给出 TAF 质量裁决 —— 显存计算器跳过的那个问题：塞得进且还能用吗？",
+    "gguf.title":                  "🧊 GGUF 有效性桥",
+    "gguf.tip":                    "<strong>塞进显存 ≠ 能用</strong>。GGUF/显存计算器读取模型元数据来告诉你某量化<em>是否塞得进 GPU</em>。本工具通过 HTTP Range 直接从 <code>.gguf</code> 头读取同样的元数据（rope_theta、context_length、量化方案、注意力头几何）—— 无需下载数 GB —— 并回答它们不答的：注意力质量是否真的撑得住，量化又侵蚀了多少（γ-shift、ΔPPL）？",
+    "gguf.desc":                   "粘贴一个 GGUF 仓库（如 <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>），选择一个量化文件，获得 TAF 质量裁决：模型的有效注意力视界，以及所选量化对<em>这个具体架构</em>的 γ 位移有多大。只在浏览器内读取文件头。",
+    "gguf.repo_label":             "GGUF 仓库 id：",
+    "gguf.list_btn":               "📂 列出量化文件",
+    "gguf.file_label":             "量化文件：",
+    "gguf.target_label":           "目标上下文 L（可选）：",
+    "gguf.analyze_btn":            "🧊 分析 GGUF",
+    "gguf.all_btn":                "📊 比较所有量化",
+    "gguf.compare_title":          "所有量化 —— 质量对比",
+    "gguf.col.verdict":            "裁决",
+    "gguf.col.gamma_at_l":         "γ @ L（量化后）",
+    "gguf.need_repo":              "输入 GGUF 仓库 id，如 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
+    "gguf.listing":                "正在从 HF Hub 列出 .gguf 文件…",
+    "gguf.no_files":               "该仓库中未找到 .gguf 文件。",
+    "gguf.found":                  "个量化文件已找到",
+    "gguf.pick_hint":              "选一个并点击分析。",
+    "gguf.reading":                "正在通过 HTTP Range 读取 GGUF 头…",
+    "gguf.read_ok":                "头已解析",
+    "gguf.verdict.healthy":        "健康 —— 量化后有效视界以良好的 γ 到达 L",
+    "gguf.verdict.usable_with_care":"可用但需谨慎 —— 到达 L，但量化后 γ 偏低",
+    "gguf.verdict.degrades":       "退化 —— 注意力在 L 之前崩溃（或被量化推到那里）",
+    "gguf.r.arch":                 "架构",
+    "gguf.r.ctx_train":            "训练上下文",
+    "gguf.r.horizon_fp16":         "注意力视界（fp16）",
+    "gguf.r.quant":                "量化方案",
+    "gguf.r.gamma_shift":          "量化导致的 γ 位移",
+    "gguf.r.after_quant":          "（量化后）",
+    "gguf.r.eff_horizon":          "有效视界（量化后）",
+    "gguf.r.no_quant_shift":       "—— 全精度，无 γ 位移",
+    "gguf.r.note":                 "视界来自 γ_Padé / d_horizon（架构）。量化 γ 位移 + ΔPPL 来自 quant-regime 模型（以 llama.cpp PPL + AWQ/GPTQ 论文校准）。两者皆为估计 —— 边界情况请用真实评测核实。",
+    "gguf.err.not_gguf":           "该文件不是有效的 GGUF（magic 错误）。",
+    "gguf.err.too_large":          "元数据头超出获取上限 —— tokenizer 异常大。请换一个量化。",
+    "gguf.err.incomplete":         "GGUF 元数据缺少 rope_theta 或 context_length —— 无法计算视界。",
+    "help.v091.gguf.title":        "🧊 GGUF 有效性桥",
+    "help.v091.gguf.body":         "那一打 GGUF/显存计算器（NyxKrage、oobabooga……）读取 <code>.gguf</code> 头来告诉你某量化<em>是否塞得进 GPU</em>。本工具读取同样的头 —— 通过 HTTP Range，无需下载数 GB —— 并回答它们跳过的：<em>塞得进且还能用吗？</em> 粘贴一个 GGUF 仓库，选择一个量化文件；桥会提取 <code>rope_theta</code>、<code>context_length</code>、量化方案（来自 <code>general.file_type</code> 或文件名）和头几何，然后运行 TAF 的 γ_Padé / d_horizon 加上架构感知的 quant-regime γ 位移。输出：训练上下文处的有效注意力视界、量化对<em>该</em>模型侵蚀 γ（及 ΔPPL）的程度，以及裁决。<em>用例</em>：'Q4_K_M 塞得进 8GB —— 但超过 30K 会变傻吗？' → 在下载 6 GB 之前先看视界和 Q4 的 γ 惩罚。",
     "yarn.title":                  "🧵 YaRN / RoPE 上下文扩展规划器",
     "yarn.tip":                    "<strong>配置 + 裁决，不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块，并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制，全在浏览器内运行。",
     "yarn.desc":                   "想让模型超出其训练上下文运行？输入模型（或其 θ + 训练上下文）和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段（transformers ≥4.43），外加 TAF 裁决：有效注意力视界能否到达 L，还是模型在 d_horizon 之外就开始幻觉？",

js/main.js CHANGED Viewed

@@ -39,6 +39,7 @@ import {
   loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
 } from "./longscore.js";
 import { planExtension, suggestRopeType } from "./yarn_planner.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -233,6 +234,7 @@ document.addEventListener("click", (e) => {
       longscore: "longscore-section",
       hub: "hub-section",
       yarn: "yarn-section",
     }[targetMode];
     if (sectionId) {
       const sec = document.getElementById(sectionId);
@@ -257,7 +259,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
-     "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
@@ -277,6 +279,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
       longscore: "longscore-section",
       hub: "hub-section",
       yarn: "yarn-section",
     };
     const sectionId = sectionMap[mode];
     if (sectionId) $(sectionId).style.display = "";
@@ -291,6 +294,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
     if (mode === "longscore") initLongscore();
     if (mode === "hub") initHub();
     if (mode === "yarn") initYarn();
   });
 });
@@ -4661,9 +4665,20 @@ function initYarn() {
   });
 }
 function _yarnFmtK(n) {
   if (n == null || !Number.isFinite(n)) return "—";
-  if (n >= 1000) return (n / 1000).toFixed(n >= 10000 ? 0 : 1) + "K";
   return String(Math.round(n));
 }
 function _yarnFmtG(g) {
@@ -4720,7 +4735,7 @@ function renderYarnPlan(p) {
       <tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
       <tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
       <tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
-      <tr><td style="${td}">θ_eff</td><td>${_yarnFmtK(p.thetaEff)}${p.thetaEff > p.theta ? ` (↑ ${t("yarn.r.from")} ${_yarnFmtK(p.theta)})` : ""}</td></tr>
       <tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ &lt; L"}</td></tr>
     </table>
     <h3>${t("yarn.r.snippet")}</h3>
@@ -4736,6 +4751,196 @@ function renderYarnPlan(p) {
   });
 }
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

   loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
 } from "./longscore.js";
 import { planExtension, suggestRopeType } from "./yarn_planner.js";
+import { listGgufFiles, fetchGgufMetadata, ggufToConfig, quantFromFilename, analyzeGguf } from "./gguf_bridge.js";
 // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
 // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
       longscore: "longscore-section",
       hub: "hub-section",
       yarn: "yarn-section",
+      gguf: "gguf-section",
     }[targetMode];
     if (sectionId) {
       const sec = document.getElementById(sectionId);
      "diagnose-section", "phase-section", "unmask-section",
      "template-section", "arena-section", "contam-section",
      "quant-section", "drift-section", "niah-section",
+     "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section", "gguf-section"].forEach(id => {
       const el = $(id);
       if (el) el.style.display = "none";
     });
       longscore: "longscore-section",
       hub: "hub-section",
       yarn: "yarn-section",
+      gguf: "gguf-section",
     };
     const sectionId = sectionMap[mode];
     if (sectionId) $(sectionId).style.display = "";
     if (mode === "longscore") initLongscore();
     if (mode === "hub") initHub();
     if (mode === "yarn") initYarn();
+    if (mode === "gguf") initGguf();
   });
 });
   });
 }
+// Context / horizon lengths: binary-K so 32768→32K, 131072→128K, 8192→8K
+// (the convention everyone uses for context windows), not decimal-K (→33K).
 function _yarnFmtK(n) {
   if (n == null || !Number.isFinite(n)) return "—";
+  if (n >= 1048576) return (n / 1048576).toFixed(1) + "M";
+  if (n >= 1024) return Math.round(n / 1024) + "K";
+  return String(Math.round(n));
+}
+// RoPE θ is an arbitrary base, not a power of two → decimal M/K reads naturally
+// (1000000→1M, 500000→500K, 40000→40K).
+function _thetaFmt(n) {
+  if (n == null || !Number.isFinite(n)) return "—";
+  if (n >= 1e6) return (n / 1e6).toFixed(n % 1e6 === 0 ? 0 : 1) + "M";
+  if (n >= 1000) return (n / 1000).toFixed(n % 1000 === 0 ? 0 : 1) + "K";
   return String(Math.round(n));
 }
 function _yarnFmtG(g) {
       <tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
       <tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
       <tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
+      <tr><td style="${td}">θ_eff</td><td>${_thetaFmt(p.thetaEff)}${p.thetaEff > p.theta ? ` (↑ ${t("yarn.r.from")} ${_thetaFmt(p.theta)})` : ""}</td></tr>
       <tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ &lt; L"}</td></tr>
     </table>
     <h3>${t("yarn.r.snippet")}</h3>
   });
 }
+// ════════════════════════════════════════════════════════════════════
+// 🧊 GGUF Validity Bridge (v0.9.1)
+// ════════════════════════════════════════════════════════════════════
+let _ggufWired = false;
+let _ggufFiles = [];
+let _ggufCfgCache = {}; // "repo|file" → ggufToConfig result (geometry is shared across quants)
+// Parse a .gguf header once and cache. The architecture/θ/context/head geometry
+// is identical across every quant of the same model — only the quant scheme
+// differs — so one parsed file is enough to score the whole repo.
+async function ggufGetCfg(repo, file) {
+  const key = `${repo}|${file}`;
+  if (_ggufCfgCache[key]) return _ggufCfgCache[key];
+  const url = `https://huggingface.co/${repo}/resolve/main/${file}`;
+  const meta = await fetchGgufMetadata(url);
+  const cfg = ggufToConfig(meta);
+  if (!cfg.quant_scheme) {
+    const q = quantFromFilename(file);
+    cfg.quant_label = cfg.quant_label || q.label;
+    cfg.quant_scheme = q.scheme;
+  }
+  cfg.__bytesRead = meta.bytesRead;
+  _ggufCfgCache[key] = cfg;
+  return cfg;
+}
+function initGguf() {
+  if (_ggufWired) return;
+  _ggufWired = true;
+  const listBtn = $("gguf-list-btn");
+  const analyzeBtn = $("gguf-analyze-btn");
+  const allBtn = $("gguf-all-btn");
+  const fileSel = $("gguf-file");
+  listBtn?.addEventListener("click", async () => {
+    const repo = ($("gguf-repo").value || "").trim();
+    if (!repo) { $("gguf-status").textContent = "⚠ " + t("gguf.need_repo"); return; }
+    $("gguf-status").textContent = "⏳ " + t("gguf.listing");
+    listBtn.disabled = true;
+    state.lastModelId = repo;
+    try {
+      const files = await listGgufFiles(repo);
+      if (!files.length) { $("gguf-status").textContent = "⚠ " + t("gguf.no_files"); fileSel.disabled = true; analyzeBtn.disabled = true; return; }
+      fileSel.innerHTML = files.map(f => `<option value="${escapeHtml(f)}">${escapeHtml(f)}</option>`).join("");
+      // Default-select a Q4_K_M (the community sweet spot) if present.
+      const def = files.find(f => /q4_k_m/i.test(f)) || files[0];
+      fileSel.value = def;
+      fileSel.disabled = false;
+      analyzeBtn.disabled = false;
+      $("gguf-all-btn").disabled = false;
+      _ggufFiles = files;
+      $("gguf-status").innerHTML = `✅ ${files.length} ${t("gguf.found")} — ${t("gguf.pick_hint")}`;
+    } catch (err) {
+      $("gguf-status").textContent = `❌ ${err.message}`;
+    } finally {
+      listBtn.disabled = false;
+    }
+  });
+  analyzeBtn?.addEventListener("click", async () => {
+    const repo = ($("gguf-repo").value || "").trim();
+    const file = fileSel.value;
+    if (!repo || !file) return;
+    $("gguf-status").textContent = "⏳ " + t("gguf.reading");
+    analyzeBtn.disabled = true;
+    try {
+      const cfg = await ggufGetCfg(repo, file);
+      const target = parseFloat($("gguf-target").value) || null;
+      const result = analyzeGguf(cfg, target);
+      $("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
+      renderGgufResult(cfg, result);
+    } catch (err) {
+      $("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
+    } finally {
+      analyzeBtn.disabled = false;
+    }
+  });
+  allBtn?.addEventListener("click", async () => {
+    const repo = ($("gguf-repo").value || "").trim();
+    const file = fileSel.value;
+    if (!repo || !file) return;
+    $("gguf-status").textContent = "⏳ " + t("gguf.reading");
+    allBtn.disabled = true; analyzeBtn.disabled = true;
+    try {
+      // One header parse gives the shared geometry; score every quant from it.
+      const cfg = await ggufGetCfg(repo, file);
+      const target = parseFloat($("gguf-target").value) || null;
+      // Dedupe repo files to one row per quant label (drop shard suffixes).
+      const seen = new Set();
+      const rows = [];
+      for (const f of _ggufFiles) {
+        const q = quantFromFilename(f);
+        if (q.label === "?" || seen.has(q.label)) continue;
+        seen.add(q.label);
+        const res = analyzeGguf({ ...cfg, quant_label: q.label, quant_scheme: q.scheme }, target);
+        rows.push({ label: q.label, scheme: q.scheme, res });
+      }
+      // Best precision first: lowest γ-shift (baseline F16 = 0) at the top.
+      rows.sort((a, b) => (a.res.quant?.gamma_shift ?? 0) - (b.res.quant?.gamma_shift ?? 0));
+      $("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
+      renderGgufComparison(cfg, rows);
+    } catch (err) {
+      $("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
+    } finally {
+      allBtn.disabled = false; analyzeBtn.disabled = false;
+    }
+  });
+}
+function ggufErrMsg(err) {
+  return ({
+    not_a_gguf_file: t("gguf.err.not_gguf"),
+    gguf_metadata_too_large: t("gguf.err.too_large"),
+  })[err.message] || err.message;
+}
+function renderGgufResult(cfg, r) {
+  const out = $("gguf-output");
+  if (!out) return;
+  out.style.display = "";
+  if (r.verdict === "incomplete") {
+    out.innerHTML = `<div class="gc-validity-warning">⚠ ${t("gguf.err.incomplete")}</div>`;
+    return;
+  }
+  const meta = ({
+    healthy:          { emoji: "✅", cls: "v-yes" },
+    usable_with_care: { emoji: "⚠️", cls: "v-deg" },
+    degrades:         { emoji: "🚨", cls: "v-no"  },
+  })[r.verdict] || { emoji: "❓", cls: "v-deg" };
+  const td = "padding:3px 12px 3px 0;";
+  const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
+    ? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
+  // Quant block (may be null for F16/F32 files).
+  let quantHtml = "";
+  if (r.quant) {
+    const regimeEmoji = ({ safe: "✅", mild: "🟡", significant: "🟠", cliff: "🚨" })[r.quant.regime] || "";
+    const dp = r.quant.delta_ppl;
+    quantHtml = `
+      <tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "?"}</code></td></tr>
+      <tr><td style="${td}">${t("gguf.r.gamma_shift")}</td><td>−${_yarnFmtG(r.quant.gamma_shift)} ${regimeEmoji} <span class="subtle">${t("quant.regime." + r.quant.regime) || r.quant.regime}</span></td></tr>
+      <tr><td style="${td}">ΔPPL</td><td>≈ +${dp.mid} <span class="subtle">(${dp.low}–${dp.high})</span></td></tr>`;
+  } else {
+    quantHtml = `<tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "F16/F32"}</code> <span class="subtle">${t("gguf.r.no_quant_shift")}</span></td></tr>`;
+  }
+  out.innerHTML = `
+    <p><span class="verdict-badge ${meta.cls}">${meta.emoji} ${t("gguf.verdict." + r.verdict)}</span></p>
+    <table style="border-collapse:collapse;font-size:0.95em;margin:0.5em 0;">
+      <tr><td style="${td}">${t("gguf.r.arch")}</td><td><code>${escapeHtml(r.arch)}</code> · ${gqa} · θ=${_thetaFmt(r.theta)}</td></tr>
+      <tr><td style="${td}">${t("gguf.r.ctx_train")}</td><td>${_yarnFmtK(r.nCtx)}</td></tr>
+      <tr><td style="${td}">${t("gguf.r.horizon_fp16")}</td><td>${_yarnFmtK(r.dHoriz)} <span class="subtle">(γ=${_yarnFmtG(r.gammaTrain)})</span></td></tr>
+      ${quantHtml}
+      <tr><td style="${td}"><strong>γ @ L=${_yarnFmtK(r.L)}</strong> ${t("gguf.r.after_quant")}</td><td><strong>${_yarnFmtG(r.gammaQuant)}</strong> <span class="subtle">(fp16: ${_yarnFmtG(r.gammaAtL)})</span></td></tr>
+    </table>
+    <p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
+}
+function renderGgufComparison(cfg, rows) {
+  const out = $("gguf-output");
+  if (!out) return;
+  out.style.display = "";
+  const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
+    ? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
+  // Short verdict label = the word before the em-dash of the full verdict string
+  // (works in every language: "HEALTHY — …", "SANO — …", "健康 —— …").
+  const short = v => (t("gguf.verdict." + v) || v).split(/——|—| - /)[0].trim();
+  const emo = v => ({ healthy: "✅", usable_with_care: "⚠️", degrades: "🚨" })[v] || "❓";
+  const td = "padding:3px 14px 3px 0;";
+  const head = `<tr style="text-align:left;border-bottom:1px solid var(--border);">
+    <th style="${td}">${t("gguf.r.quant")}</th><th style="${td}">${t("gguf.r.gamma_shift")}</th>
+    <th style="${td}">${t("gguf.col.gamma_at_l")}</th><th style="${td}">${t("gguf.col.verdict")}</th></tr>`;
+  const body = rows.map(({ label, res }) => {
+    const shift = res.quant ? "−" + _yarnFmtG(res.quant.gamma_shift) : "—";
+    return `<tr><td style="${td}"><code>${escapeHtml(label)}</code></td><td style="${td}">${shift}</td>
+      <td style="${td}">${_yarnFmtG(res.gammaQuant)}</td>
+      <td style="${td}">${emo(res.verdict)} ${short(res.verdict)}</td></tr>`;
+  }).join("");
+  // d_horizon is θ-set → identical for every quant; show it once in the header line.
+  out.innerHTML = `<h3>${t("gguf.compare_title")}</h3>
+    <p class="subtle">${escapeHtml(cfg.architecture)} · ${gqa} · θ=${_thetaFmt(cfg.rope_theta)} · ctx ${_yarnFmtK(cfg.context_length)} · horizon ${_yarnFmtK(rows[0]?.res.dHoriz)} · L=${_yarnFmtK(rows[0]?.res.L)}</p>
+    <table style="border-collapse:collapse;font-size:0.93em;">${head}${body}</table>
+    <p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
+}
 // ════════════════════════════════════════════════════════════════════
 // Bootstrap
 // ════════════════════════════════════════════════════════════════════

test_gguf.mjs ADDED Viewed

	@@ -0,0 +1,107 @@

+import { chromium } from "playwright";
+const BASE = "http://127.0.0.1:8000/index.html";
+const b = await chromium.launch({ headless: true });
+const p = await (await b.newContext()).newPage();
+const errors = [];
+const benign = s => /Failed to load resource.*40\d|status of 40\d/.test(s);
+p.on("console", m => { if (m.type()==="error" && !benign(m.text())) errors.push(`[err] ${m.text()}`); });
+p.on("pageerror", e => errors.push(`[pageerror] ${e.message}`));
+const log = s => process.stdout.write(s+"\n");
+let pass=0, fail=0;
+const check=(n,c,x="")=>{ log(`${c?"  OK  ":"  FAIL"} ${n} ${x}`); c?pass++:fail++; };
+await p.goto(BASE,{waitUntil:"domcontentloaded",timeout:90000});
+await p.waitForTimeout(2500);
+await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
+check("module loads, 0 errors", errors.length===0, `(errors=${errors.length})`);
+await p.click('[data-mode-link="gguf"]',{timeout:5000}); await p.waitForTimeout(500);
+const secVis = await p.evaluate(()=>{const s=document.querySelector("#gguf-section");return s&&getComputedStyle(s).display!=="none";});
+check("gguf-section visible after tile click", secVis);
+log("\n── List quant files (real repo) ──");
+await p.fill("#gguf-repo","Qwen/Qwen2.5-0.5B-Instruct-GGUF");
+await p.click("#gguf-list-btn");
+await p.waitForTimeout(4000);
+const listed = await p.evaluate(()=>{
+  const sel=document.querySelector("#gguf-file");
+  return { count:sel.options.length, selected:sel.value, disabled:sel.disabled,
+           analyzeEnabled:!document.querySelector("#gguf-analyze-btn").disabled,
+           status:document.querySelector("#gguf-status").innerText.slice(0,60) };
+});
+check("files listed in dropdown", listed.count>0, `(${listed.count} files)`);
+check("Q4_K_M auto-selected", /q4_k_m/i.test(listed.selected), listed.selected);
+check("analyze button enabled", listed.analyzeEnabled);
+log("\n── Analyze GGUF (parse header + verdict) ──");
+await p.click("#gguf-analyze-btn");
+await p.waitForTimeout(8000); // range fetch + parse
+const r = await p.evaluate(()=>{
+  const o=document.querySelector("#gguf-output");
+  return { vis:getComputedStyle(o).display!=="none",
+           verdict:o.querySelector(".verdict-badge")?.innerText?.trim()||"",
+           text:o.innerText,
+           status:document.querySelector("#gguf-status").innerText };
+});
+check("output rendered", r.vis && r.text.length>50);
+check("verdict present", r.verdict.length>3, r.verdict);
+check("shows architecture qwen2", /qwen2/.test(r.text));
+check("shows trained context 32K", /32K|32768/.test(r.text), (r.text.match(/Trained context[^\n]*\n?\s*[\w.]+/)||[""])[0].slice(0,40));
+check("shows quant Q4_K_M", /Q4_K_M/i.test(r.text));
+check("shows γ-shift from quant", /γ-shift|shift/i.test(r.text));
+check("shows ΔPPL", /ΔPPL|PPL/.test(r.text));
+check("header parsed status (MB)", /MB header|parsed|analizada|analysé|已解析/i.test(r.status), r.status.slice(0,50));
+log("\n── Target L override ──");
+await p.fill("#gguf-target","131072");
+await p.click("#gguf-analyze-btn");
+await p.waitForTimeout(7000);
+const r2 = await p.evaluate(()=>document.querySelector("#gguf-output .verdict-badge")?.innerText?.trim());
+check("re-analyze with L=131072", r2.length>3, r2);
+log("\n── Compare all quants (one header parse → full table) ──");
+await p.click("#gguf-all-btn");
+await p.waitForTimeout(7000);
+const cmp = await p.evaluate(()=>{
+  const o=document.querySelector("#gguf-output");
+  const rows=[...o.querySelectorAll("table tr")];
+  const dataRows=rows.slice(1); // minus header
+  return { title:o.querySelector("h3")?.innerText,
+           rowCount:dataRows.length,
+           quants:dataRows.map(r=>r.querySelector("code")?.innerText).filter(Boolean),
+           hasShift:/−0\.|—/.test(o.innerText),
+           hasVerdictCol:rows[0]?.innerText?.includes("Verdict") };
+});
+check("comparison table rendered", cmp.rowCount>=3, `(${cmp.rowCount} rows)`);
+check("lists multiple quant labels", cmp.quants.length>=3, cmp.quants.join(", "));
+check("has verdict column", cmp.hasVerdictCol, cmp.title);
+check("rows sorted best→worst (Q8 before Q2)", (()=>{
+  const i8=cmp.quants.findIndex(q=>/Q8/.test(q)), i2=cmp.quants.findIndex(q=>/Q2/.test(q));
+  return i8<0||i2<0||i8<i2;})(), cmp.quants.join(" > "));
+// Verdicts must vary across quants (regression guard: a hard d_horizon gate
+// once forced every row to DEGRADES even when γ@L was healthy).
+const verdicts = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>r.lastElementChild?.innerText?.trim()));
+check("verdicts vary across quants (not all identical)", new Set(verdicts).size>=2, verdicts.join(" | "));
+// γ@L must DECREASE for worse quants (Q8 γ@L > Q2 γ@L).
+const gammas = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>parseFloat(r.children[2]?.innerText)));
+check("γ@L decreases for worse quant", gammas[0] > gammas[gammas.length-1], `${gammas[0]} → ${gammas[gammas.length-1]}`);
+log("\n── 4-language verdict ──");
+for (const lang of ["es","fr","zh","en"]) {
+  await p.click(`.lang-btn[data-lang="${lang}"]`); await p.waitForTimeout(300);
+  const label = await p.evaluate(()=>document.querySelector('.mode-btn[data-mode="gguf"]')?.textContent?.trim());
+  check(`${lang}: tab label localized`, label && label.length>3, label);
+}
+log("\n── Error path: bad repo ──");
+await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
+await p.fill("#gguf-repo","this/definitely-not-a-real-repo-xyz123");
+await p.click("#gguf-list-btn");
+await p.waitForTimeout(3000);
+const errStatus = await p.evaluate(()=>document.querySelector("#gguf-status").innerText);
+check("bad repo → error message", /❌|not found|HTTP/i.test(errStatus), errStatus.slice(0,50));
+log(`\n=== ${pass} passed, ${fail} failed · JS errors: ${errors.length} ===`);
+errors.slice(0,10).forEach(e=>log(e));
+await b.close();
+process.exit(fail>0?1:0);