karlexmarin Claude Opus 4.7 (1M context) commited on
Commit
2eb69cb
·
1 Parent(s): e5ceb83

v0.9.1: GGUF Validity Bridge mode + binary header parser

Browse files

Reads a .gguf file's metadata header straight from HF Hub via HTTP Range
(no multi-GB download) and answers what the dozen GGUF/VRAM calculators
skip: fits in VRAM AND still works?

- js/gguf_bridge.js: incremental Range-fetch GGUF v2/v3 parser (magic, KV
block, arrays skipped by byte-length so the tokenizer doesn't blow the
buffer). ggufToConfig maps GGUF metadata → HF-style config; quant scheme
from general.file_type with filename backstop. analyzeGguf cross-runs
γ_Padé / d_horizon (architecture) with the quant-regime γ-shift.
- "Compare all quants": one header parse → scores every quant in the repo
(geometry is shared; only the scheme differs), sorted best→worst as a
table. γ@L after quant is the comparison axis — it degrades monotonically;
d_horizon is NOT recomputed from a quant-shifted γ (that inverts the
formula). Verdict driven by γ@L + quant regime, not a hard d_horizon gate
(which understates reach for high-θ models like Qwen).
- index.html: tab + tile + #gguf-section + help v0.9.1 entry.
- main.js: import, wiring, cached header parse, single + comparison renders.
Context/horizon now formatted binary-K (32768→32K, not 33K); θ decimal M/K.
- i18n.js: full EN/ES/FR/ZH for all gguf.* keys.

Test (test_gguf.mjs): 25/25 — list/parse real GGUF (Qwen2.5 q4_k_m, 6MB
header), verdict, compare-all table, monotonic γ@L, verdict variety, 4
languages, error paths. 24 modes total, 0 JS errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (5) hide show
  1. index.html +43 -0
  2. js/gguf_bridge.js +245 -0
  3. js/i18n.js +152 -0
  4. js/main.js +208 -3
  5. test_gguf.mjs +107 -0
index.html CHANGED
@@ -246,6 +246,9 @@
246
  <p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
247
  <p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
248
 
 
 
 
249
  <h3 data-i18n="help.audit.title">The audit chain</h3>
250
  <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
251
  output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
@@ -408,6 +411,7 @@
408
  <button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
409
  <button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
410
  <button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
 
411
  <button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
412
  </div>
413
  </div>
@@ -503,6 +507,7 @@
503
  <button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
504
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
505
  <button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
 
506
  </div>
507
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
508
  <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
@@ -1290,6 +1295,44 @@
1290
  <div id="yarn-output" style="display:none; margin-top:1em;"></div>
1291
  </section>
1292
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1293
  <!-- Recipe selector (mode=recipe) -->
1294
  <section id="recipe-section" style="display:none;">
1295
  <h2 data-i18n="recipe.title">📋 Recipe</h2>
 
246
  <p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
247
  <p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
248
 
249
+ <p><strong data-i18n="help.v091.gguf.title">🧊 GGUF Validity Bridge</strong></p>
250
+ <p data-i18n="help.v091.gguf.body">The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict — HEALTHY / USABLE-WITH-CARE / DEGRADES. <em>Use case</em>: 'unsloth/Qwen3.5-9B-GGUF Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.</p>
251
+
252
  <h3 data-i18n="help.audit.title">The audit chain</h3>
253
  <p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
254
  output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
 
411
  <button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
412
  <button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
413
  <button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
414
+ <button data-mode-link="gguf" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
415
  <button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
416
  </div>
417
  </div>
 
507
  <button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
508
  <button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
509
  <button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
510
+ <button class="mode-btn" data-mode="gguf" role="tab" aria-selected="false" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
511
  </div>
512
  <p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
513
  <strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
 
1295
  <div id="yarn-output" style="display:none; margin-top:1em;"></div>
1296
  </section>
1297
 
1298
+ <!-- GGUF Validity Bridge (mode=gguf) -->
1299
+ <section id="gguf-section" style="display:none;">
1300
+ <h2><span data-i18n="gguf.title">🧊 GGUF Validity Bridge</span>
1301
+ <span class="info"><span class="tooltip" data-i18n="gguf.tip">
1302
+ <strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to
1303
+ tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta,
1304
+ context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via
1305
+ HTTP Range — no multi-GB download — and answers the question they don't: does attention
1306
+ quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?
1307
+ </span></span>
1308
+ </h2>
1309
+ <p class="recipe-desc" data-i18n="gguf.desc">
1310
+ Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a
1311
+ TAF quality verdict: the model's effective attention horizon, plus how much the chosen
1312
+ quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your
1313
+ browser.
1314
+ </p>
1315
+
1316
+ <div class="form-row">
1317
+ <label for="gguf-repo" data-i18n="gguf.repo_label">GGUF repo id:</label>
1318
+ <input type="text" id="gguf-repo" placeholder="Qwen/Qwen2.5-7B-Instruct-GGUF">
1319
+ <button id="gguf-list-btn" class="secondary" data-i18n="gguf.list_btn">📂 List quant files</button>
1320
+ </div>
1321
+ <span id="gguf-status" class="subtle"></span>
1322
+
1323
+ <div class="form-row">
1324
+ <label for="gguf-file" data-i18n="gguf.file_label">Quant file:</label>
1325
+ <select id="gguf-file" disabled></select>
1326
+ </div>
1327
+ <div class="form-row">
1328
+ <label for="gguf-target" data-i18n="gguf.target_label">Target context L (optional):</label>
1329
+ <input type="number" id="gguf-target" placeholder="(defaults to trained context)" min="256">
1330
+ </div>
1331
+ <button id="gguf-analyze-btn" disabled data-i18n="gguf.analyze_btn">🧊 Analyze GGUF</button>
1332
+ <button id="gguf-all-btn" class="secondary" disabled data-i18n="gguf.all_btn">📊 Compare all quants</button>
1333
+ <div id="gguf-output" style="display:none; margin-top:1em;"></div>
1334
+ </section>
1335
+
1336
  <!-- Recipe selector (mode=recipe) -->
1337
  <section id="recipe-section" style="display:none;">
1338
  <h2 data-i18n="recipe.title">📋 Recipe</h2>
js/gguf_bridge.js ADDED
@@ -0,0 +1,245 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // GGUF Validity Bridge (v0.9.1 anti-bullshit pack)
2
+ //
3
+ // The dozen GGUF/VRAM calculators on HF answer "does this quant fit in my GPU?".
4
+ // None answer "does it fit AND still work?". This reads a .gguf file's metadata
5
+ // header directly in the browser (HTTP Range — no full multi-GB download), pulls
6
+ // rope_theta + context_length + quant scheme + head geometry, then runs TAF's
7
+ // γ_Padé / d_horizon + the quant-regime γ-shift to emit a quality verdict:
8
+ // "fits in VRAM but attention collapses past d_horizon, and Q4 worsens γ by …".
9
+ //
10
+ // Parser logic is pure; the network fetch is unavoidable I/O. main.js renders.
11
+
12
+ import { gammaPade } from "./gamma_check.js";
13
+ import { dHorizon } from "./yarn_planner.js";
14
+ import { predictQuantShift } from "./quant_regime.js";
15
+
16
+ // ── GGUF metadata value types (spec v2/v3) ──
17
+ const GT = { U8:0, I8:1, U16:2, I16:3, U32:4, I32:5, F32:6, BOOL:7, STR:8, ARR:9, U64:10, I64:11, F64:12 };
18
+ const FIXED_SIZE = { 0:1, 1:1, 2:2, 3:2, 4:4, 5:4, 6:4, 7:1, 10:8, 11:8, 12:8 };
19
+
20
+ // general.file_type enum (llama_ftype) → human label + the quant_regime scheme id
21
+ // we feed to predictQuantShift. Only the common ones; filename parsing backstops.
22
+ const FTYPE = {
23
+ 0: ["F32", null],
24
+ 1: ["F16", null],
25
+ 2: ["Q4_0", "gguf_q4_km"],
26
+ 3: ["Q4_1", "gguf_q4_km"],
27
+ 7: ["Q8_0", "gguf_q8_0"],
28
+ 8: ["Q5_0", "gguf_q5_km"],
29
+ 9: ["Q5_1", "gguf_q5_km"],
30
+ 10: ["Q2_K", "gguf_q2_k"],
31
+ 11: ["Q3_K_S", "gguf_q3_km"],
32
+ 12: ["Q3_K_M", "gguf_q3_km"],
33
+ 13: ["Q3_K_L", "gguf_q3_km"],
34
+ 14: ["Q4_K_S", "gguf_q4_km"],
35
+ 15: ["Q4_K_M", "gguf_q4_km"],
36
+ 16: ["Q5_K_S", "gguf_q5_km"],
37
+ 17: ["Q5_K_M", "gguf_q5_km"],
38
+ 18: ["Q6_K", "gguf_q8_0"],
39
+ };
40
+
41
+ // Filename → (label, scheme) backstop when general.file_type is absent/ambiguous.
42
+ export function quantFromFilename(name) {
43
+ const n = (name || "").toUpperCase();
44
+ const pairs = [
45
+ ["Q2_K", "gguf_q2_k"], ["Q3_K", "gguf_q3_km"], ["Q4_K", "gguf_q4_km"],
46
+ ["Q5_K", "gguf_q5_km"], ["Q6_K", "gguf_q8_0"], ["Q8_0", "gguf_q8_0"],
47
+ ["Q4_0", "gguf_q4_km"], ["Q4_1", "gguf_q4_km"], ["Q5_0", "gguf_q5_km"],
48
+ ["Q5_1", "gguf_q5_km"], ["F16", null], ["BF16", null], ["F32", null],
49
+ ];
50
+ for (const [tag, scheme] of pairs) {
51
+ if (n.includes(tag)) return { label: tag.replace(/_$/, ""), scheme };
52
+ }
53
+ return { label: "?", scheme: null };
54
+ }
55
+
56
+ // List the .gguf files in a HF repo (so the user can pick a quant).
57
+ export async function listGgufFiles(repo) {
58
+ const resp = await fetch(`https://huggingface.co/api/models/${encodeURIComponent(repo).replace(/%2F/g, "/")}`);
59
+ if (!resp.ok) throw new Error(`HTTP ${resp.status} — repo not found or private`);
60
+ const data = await resp.json();
61
+ const sib = Array.isArray(data.siblings) ? data.siblings : [];
62
+ return sib.map(s => s.rfilename).filter(f => /\.gguf$/i.test(f)).sort();
63
+ }
64
+
65
+ // Incremental Range-fetch reader. GGUF metadata sits at the file head; arch +
66
+ // rope fields precede the big tokenizer arrays, so a few MB always suffices.
67
+ class GgufReader {
68
+ constructor(url) {
69
+ this.url = url;
70
+ this.buf = new Uint8Array(0);
71
+ this.dv = new DataView(this.buf.buffer);
72
+ this.off = 0;
73
+ this.fetched = 0;
74
+ this.CHUNK = 1 << 20; // 1 MB per range
75
+ this.MAX = 48 << 20; // hard cap 48 MB
76
+ this.eof = false;
77
+ }
78
+ async ensure(n) {
79
+ while (this.off + n > this.buf.length && !this.eof && this.fetched < this.MAX) {
80
+ const start = this.fetched;
81
+ const end = Math.min(this.fetched + this.CHUNK, this.MAX) - 1;
82
+ const resp = await fetch(this.url, { headers: { Range: `bytes=${start}-${end}` } });
83
+ if (!resp.ok && resp.status !== 206 && resp.status !== 200) throw new Error(`HTTP ${resp.status}`);
84
+ const part = new Uint8Array(await resp.arrayBuffer());
85
+ if (part.length === 0) { this.eof = true; break; }
86
+ const merged = new Uint8Array(this.buf.length + part.length);
87
+ merged.set(this.buf); merged.set(part, this.buf.length);
88
+ this.buf = merged;
89
+ this.dv = new DataView(this.buf.buffer);
90
+ this.fetched += part.length;
91
+ if (part.length < this.CHUNK) this.eof = true; // server returned the tail
92
+ }
93
+ if (this.off + n > this.buf.length) throw new Error("gguf_metadata_too_large");
94
+ }
95
+ async u8() { await this.ensure(1); return this.dv.getUint8(this.off++); }
96
+ async u16() { await this.ensure(2); const v = this.dv.getUint16(this.off, true); this.off += 2; return v; }
97
+ async i16() { await this.ensure(2); const v = this.dv.getInt16(this.off, true); this.off += 2; return v; }
98
+ async u32() { await this.ensure(4); const v = this.dv.getUint32(this.off, true); this.off += 4; return v; }
99
+ async i32() { await this.ensure(4); const v = this.dv.getInt32(this.off, true); this.off += 4; return v; }
100
+ async f32() { await this.ensure(4); const v = this.dv.getFloat32(this.off, true); this.off += 4; return v; }
101
+ async f64() { await this.ensure(8); const v = this.dv.getFloat64(this.off, true); this.off += 8; return v; }
102
+ // u64/i64 as Number — safe for counts/dims well under 2^53.
103
+ async u64() { await this.ensure(8); const lo = this.dv.getUint32(this.off, true); const hi = this.dv.getUint32(this.off + 4, true); this.off += 8; return hi * 4294967296 + lo; }
104
+ async i64() { return this.u64(); }
105
+ async skip(n) { await this.ensure(0); // ensure buffer exists
106
+ // skip may exceed current buffer; pull enough then advance offset
107
+ await this.ensure(Math.min(n, this.MAX)); this.off += n;
108
+ if (this.off > this.buf.length) { this.off = this.buf.length; throw new Error("gguf_metadata_too_large"); }
109
+ }
110
+ async str() {
111
+ const len = await this.u64();
112
+ await this.ensure(len);
113
+ const bytes = this.buf.subarray(this.off, this.off + len);
114
+ this.off += len;
115
+ return new TextDecoder("utf-8").decode(bytes);
116
+ }
117
+ }
118
+
119
+ async function readValue(r, type) {
120
+ switch (type) {
121
+ case GT.U8: return r.u8();
122
+ case GT.I8: { const v = await r.u8(); return v > 127 ? v - 256 : v; }
123
+ case GT.U16: return r.u16();
124
+ case GT.I16: return r.i16();
125
+ case GT.U32: return r.u32();
126
+ case GT.I32: return r.i32();
127
+ case GT.F32: return r.f32();
128
+ case GT.BOOL: return (await r.u8()) !== 0;
129
+ case GT.STR: return r.str();
130
+ case GT.U64: return r.u64();
131
+ case GT.I64: return r.i64();
132
+ case GT.F64: return r.f64();
133
+ case GT.ARR: {
134
+ const et = await r.u32();
135
+ const len = await r.u64();
136
+ if (FIXED_SIZE[et]) { await r.skip(len * FIXED_SIZE[et]); return { __array: len, elemType: et }; }
137
+ if (et === GT.STR) { for (let i = 0; i < len; i++) { const sl = await r.u64(); await r.skip(sl); } return { __array: len, elemType: et }; }
138
+ throw new Error("gguf_nested_array");
139
+ }
140
+ default: throw new Error(`gguf_unknown_type_${type}`);
141
+ }
142
+ }
143
+
144
+ // Parse the metadata KV block. Returns a flat { key: value } map (arrays are
145
+ // returned as {__array,len} stubs — we never need their contents here).
146
+ export async function fetchGgufMetadata(url) {
147
+ const r = new GgufReader(url);
148
+ const magic = (await r.u8()) | ((await r.u8()) << 8) | ((await r.u8()) << 16) | ((await r.u8()) << 24);
149
+ if (magic !== 0x46554747 /* 'GGUF' little-endian */) throw new Error("not_a_gguf_file");
150
+ const version = await r.u32();
151
+ const tensorCount = await r.u64();
152
+ const kvCount = await r.u64();
153
+ const kv = {};
154
+ for (let i = 0; i < kvCount; i++) {
155
+ const key = await r.str();
156
+ const type = await r.u32();
157
+ kv[key] = await readValue(r, type);
158
+ }
159
+ return { version, tensorCount, kvCount, kv, bytesRead: r.fetched };
160
+ }
161
+
162
+ // Map raw GGUF metadata → HF-style config (so quant_regime + TAF math can reuse it).
163
+ export function ggufToConfig(meta) {
164
+ const kv = meta.kv || {};
165
+ const arch = kv["general.architecture"];
166
+ const g = (suffix, fallback = null) => (arch && kv[`${arch}.${suffix}`] !== undefined ? kv[`${arch}.${suffix}`] : fallback);
167
+
168
+ const n_attn = g("attention.head_count");
169
+ const n_kv = g("attention.head_count_kv", n_attn);
170
+ const hidden = g("embedding_length");
171
+ const keyLen = g("attention.key_length");
172
+ const headDim = (typeof keyLen === "number") ? keyLen
173
+ : (n_attn && hidden ? hidden / n_attn : null);
174
+ const ftypeEnum = kv["general.file_type"];
175
+ const ftype = (typeof ftypeEnum === "number" && FTYPE[ftypeEnum]) ? FTYPE[ftypeEnum] : null;
176
+
177
+ return {
178
+ architecture: arch || "?",
179
+ quant_label: ftype ? ftype[0] : null,
180
+ quant_scheme: ftype ? ftype[1] : null,
181
+ rope_theta: g("rope.freq_base", null),
182
+ context_length: g("context_length", null),
183
+ rope_scaling_type: g("rope.scaling.type", null),
184
+ rope_scaling_factor: g("rope.scaling.factor", null),
185
+ rope_orig_ctx: g("rope.scaling.original_context_length", null),
186
+ // HF-config aliases for predictQuantShift / inferNParams:
187
+ num_attention_heads: n_attn ?? null,
188
+ num_key_value_heads: n_kv ?? null,
189
+ hidden_size: hidden ?? null,
190
+ head_dim: headDim,
191
+ num_hidden_layers: g("block_count", null),
192
+ sliding_window: g("attention.sliding_window", null),
193
+ vocab_size: g("vocab_size", null),
194
+ };
195
+ }
196
+
197
+ // Bridge verdict: combine GGUF geometry + TAF horizon + quant γ-shift.
198
+ // cfg : ggufToConfig output (may be edited by user / filename backstop)
199
+ // targetCtx : optional desired context L to check (else uses context_length)
200
+ export function analyzeGguf(cfg, targetCtx) {
201
+ const theta = Number(cfg.rope_theta) || 10000;
202
+ const nCtx = Number(cfg.context_length) || null;
203
+ const L = Number(targetCtx) || nCtx;
204
+
205
+ // fp16 attention horizon — architectural, set by θ. SAME across every quant
206
+ // of the model (quantisation adds noise, it does not change θ). d_horizon is
207
+ // a function of the *natural* Padé γ, so it must be computed from the fp16 γ —
208
+ // never from a quant-shifted γ (that inverts the formula and is meaningless).
209
+ const gammaTrain = nCtx ? gammaPade(theta, nCtx) : null;
210
+ const dHoriz = gammaTrain != null ? dHorizon(theta, gammaTrain) : null;
211
+
212
+ // Quant γ-shift via the existing quant-regime model (architecture-aware).
213
+ const quant = cfg.quant_scheme ? predictQuantShift(cfg, cfg.quant_scheme) : null;
214
+
215
+ // γ at the target L: fp16, then after the quant shift. This is the quantity
216
+ // that degrades monotonically with worse quant — the correct comparison axis.
217
+ const gammaAtL = (theta && L) ? gammaPade(theta, L) : null;
218
+ const shift = quant ? quant.gamma_shift : 0;
219
+ const gammaQuant = (gammaAtL != null) ? gammaAtL - shift : null;
220
+
221
+ // Verdict is driven by γ@L after quant (the direct attention-quality signal
222
+ // at the target length) plus the quant-regime band. We deliberately do NOT
223
+ // gate on L ≤ d_horizon: the closed-form d_horizon understates the true reach
224
+ // for high-θ models (e.g. Qwen θ=1e6 keeps γ healthy far past its d_horizon),
225
+ // so γ@L is the honest measure. `reaches` is reported for context only.
226
+ const reaches = dHoriz != null && L != null && L <= dHoriz;
227
+ const collapsed = !Number.isFinite(gammaQuant) || gammaQuant <= 0.2;
228
+ const quantCliff = quant && quant.regime === "cliff";
229
+ let verdict;
230
+ if (nCtx == null || theta == null) verdict = "incomplete";
231
+ else if (collapsed || quantCliff) verdict = "degrades";
232
+ else if (gammaQuant >= 0.6 && (!quant || quant.regime === "safe" || quant.regime === "mild")) verdict = "healthy";
233
+ else verdict = "usable_with_care";
234
+
235
+ return {
236
+ theta, nCtx, L,
237
+ gammaTrain, dHoriz, // fp16 architectural horizon (shared across quants)
238
+ gammaAtL, gammaQuant, // attention at L: fp16 vs after-quant
239
+ reaches, // is L within the fp16 horizon?
240
+ quant, // {gamma_shift, regime, delta_ppl, ...} or null
241
+ quantLabel: cfg.quant_label,
242
+ arch: cfg.architecture,
243
+ verdict,
244
+ };
245
+ }
js/i18n.js CHANGED
@@ -427,6 +427,44 @@ export const TRANSLATIONS = {
427
  "mode_desc.hub": "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
428
  "modes.yarn": "🧵 YaRN Planner",
429
  "mode_desc.yarn": "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
430
  "yarn.title": "🧵 YaRN / RoPE Context-Extension Planner",
431
  "yarn.tip": "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
432
  "yarn.desc": "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
@@ -1738,6 +1776,44 @@ export const TRANSLATIONS = {
1738
  "mode_desc.hub": "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
1739
  "modes.yarn": "🧵 Planificador YaRN",
1740
  "mode_desc.yarn": "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1741
  "yarn.title": "🧵 Planificador de extensión de contexto YaRN / RoPE",
1742
  "yarn.tip": "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
1743
  "yarn.desc": "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
@@ -2903,6 +2979,44 @@ export const TRANSLATIONS = {
2903
  "mode_desc.hub": "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
2904
  "modes.yarn": "🧵 Planificateur YaRN",
2905
  "mode_desc.yarn": "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2906
  "yarn.title": "🧵 Planificateur d'extension de contexte YaRN / RoPE",
2907
  "yarn.tip": "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
2908
  "yarn.desc": "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
@@ -4068,6 +4182,44 @@ export const TRANSLATIONS = {
4068
  "mode_desc.hub": "每个 LLM-eval 问题的地图 → tafagent 模式(若覆盖)+ 精选外部工具。找到方案而非重新发明。30+ 问题,7 类别。",
4069
  "modes.yarn": "🧵 YaRN 规划器",
4070
  "mode_desc.yarn": "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决:在目标长度下注意力质量是否真的撑得住。",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4071
  "yarn.title": "🧵 YaRN / RoPE 上下文扩展规划器",
4072
  "yarn.tip": "<strong>配置 + 裁决,不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块,并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制,全在浏览器内运行。",
4073
  "yarn.desc": "想让模型超出其训练上下文运行?输入模型(或其 θ + 训练上下文)和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段(transformers ≥4.43),外加 TAF 裁决:有效注意力视界能否到达 L,还是模型在 d_horizon 之外就开始幻觉?",
 
427
  "mode_desc.hub": "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
428
  "modes.yarn": "🧵 YaRN Planner",
429
  "mode_desc.yarn": "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
430
+ "modes.gguf": "🧊 GGUF Bridge",
431
+ "mode_desc.gguf": "Read a GGUF file's metadata header (rope_theta, context_length, quant) in your browser and get a TAF quality verdict — the question the VRAM calculators skip: fits AND works?",
432
+ "gguf.title": "🧊 GGUF Validity Bridge",
433
+ "gguf.tip": "<strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta, context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via HTTP Range — no multi-GB download — and answers the question they don't: does attention quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?",
434
+ "gguf.desc": "Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a TAF quality verdict: the model's effective attention horizon, plus how much the chosen quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your browser.",
435
+ "gguf.repo_label": "GGUF repo id:",
436
+ "gguf.list_btn": "📂 List quant files",
437
+ "gguf.file_label": "Quant file:",
438
+ "gguf.target_label": "Target context L (optional):",
439
+ "gguf.analyze_btn": "🧊 Analyze GGUF",
440
+ "gguf.all_btn": "📊 Compare all quants",
441
+ "gguf.compare_title": "All quants — quality comparison",
442
+ "gguf.col.verdict": "Verdict",
443
+ "gguf.col.gamma_at_l": "γ @ L (after quant)",
444
+ "gguf.need_repo": "Enter a GGUF repo id like 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
445
+ "gguf.listing": "Listing .gguf files from HF Hub…",
446
+ "gguf.no_files": "No .gguf files found in that repo.",
447
+ "gguf.found": "quant files found",
448
+ "gguf.pick_hint": "pick one and click Analyze.",
449
+ "gguf.reading": "Reading GGUF header via HTTP Range…",
450
+ "gguf.read_ok": "Header parsed",
451
+ "gguf.verdict.healthy": "HEALTHY — effective horizon reaches L with good γ after quant",
452
+ "gguf.verdict.usable_with_care":"USABLE WITH CARE — reaches L but γ is modest after quant",
453
+ "gguf.verdict.degrades": "DEGRADES — attention collapses before L (or quant pushes it there)",
454
+ "gguf.r.arch": "Architecture",
455
+ "gguf.r.ctx_train": "Trained context",
456
+ "gguf.r.horizon_fp16": "Attention horizon (fp16)",
457
+ "gguf.r.quant": "Quant scheme",
458
+ "gguf.r.gamma_shift": "γ-shift from quant",
459
+ "gguf.r.after_quant": "(after quant)",
460
+ "gguf.r.eff_horizon": "Effective horizon (quantised)",
461
+ "gguf.r.no_quant_shift": "— full precision, no γ-shift",
462
+ "gguf.r.note": "Horizon from γ_Padé / d_horizon (architecture). Quant γ-shift + ΔPPL from the quant-regime model (calibrated to llama.cpp PPL + AWQ/GPTQ papers). Both are estimates — verify borderline cases with a real eval.",
463
+ "gguf.err.not_gguf": "That file isn't a valid GGUF (bad magic).",
464
+ "gguf.err.too_large": "Metadata header exceeds the fetch cap — unusually large tokenizer. Try another quant.",
465
+ "gguf.err.incomplete": "GGUF metadata is missing rope_theta or context_length — can't compute the horizon.",
466
+ "help.v091.gguf.title": "🧊 GGUF Validity Bridge",
467
+ "help.v091.gguf.body": "The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict. <em>Use case</em>: 'Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.",
468
  "yarn.title": "🧵 YaRN / RoPE Context-Extension Planner",
469
  "yarn.tip": "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
470
  "yarn.desc": "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
 
1776
  "mode_desc.hub": "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
1777
  "modes.yarn": "🧵 Planificador YaRN",
1778
  "mode_desc.yarn": "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
1779
+ "modes.gguf": "🧊 Puente GGUF",
1780
+ "mode_desc.gguf": "Lee la cabecera de metadata de un archivo GGUF (rope_theta, context_length, quant) en tu navegador y obtén un veredicto de calidad TAF — la pregunta que los calculadores de VRAM ignoran: ¿cabe Y funciona?",
1781
+ "gguf.title": "🧊 Puente de validez GGUF",
1782
+ "gguf.tip": "<strong>Caber en VRAM ≠ funcionar</strong>. Los calculadores GGUF/VRAM leen la metadata de un modelo para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la MISMA metadata (rope_theta, context_length, esquema de quant, geometría de cabezas) directamente de la cabecera <code>.gguf</code> vía HTTP Range — sin descargar GB — y responde lo que ellos no: ¿aguanta de verdad la calidad de atención, y cuánto la erosiona el quant (γ-shift, ΔPPL)?",
1783
+ "gguf.desc": "Pega un repo GGUF (p.ej. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), elige un archivo de quant, y obtén un veredicto de calidad TAF: el horizonte de atención efectivo del modelo, más cuánto desplaza γ la cuantización elegida para <em>esta arquitectura concreta</em>. Solo lee la cabecera del archivo en tu navegador.",
1784
+ "gguf.repo_label": "ID del repo GGUF:",
1785
+ "gguf.list_btn": "📂 Listar archivos quant",
1786
+ "gguf.file_label": "Archivo quant:",
1787
+ "gguf.target_label": "Contexto objetivo L (opcional):",
1788
+ "gguf.analyze_btn": "🧊 Analizar GGUF",
1789
+ "gguf.all_btn": "📊 Comparar todos los quants",
1790
+ "gguf.compare_title": "Todos los quants — comparación de calidad",
1791
+ "gguf.col.verdict": "Veredicto",
1792
+ "gguf.col.gamma_at_l": "γ @ L (tras quant)",
1793
+ "gguf.need_repo": "Introduce un id de repo GGUF como 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
1794
+ "gguf.listing": "Listando archivos .gguf de HF Hub…",
1795
+ "gguf.no_files": "No se encontraron archivos .gguf en ese repo.",
1796
+ "gguf.found": "archivos quant encontrados",
1797
+ "gguf.pick_hint": "elige uno y pulsa Analizar.",
1798
+ "gguf.reading": "Leyendo cabecera GGUF vía HTTP Range…",
1799
+ "gguf.read_ok": "Cabecera analizada",
1800
+ "gguf.verdict.healthy": "SANO — el horizonte efectivo alcanza L con buen γ tras quant",
1801
+ "gguf.verdict.usable_with_care":"USABLE CON CUIDADO — alcanza L pero γ es modesto tras quant",
1802
+ "gguf.verdict.degrades": "DEGRADA — la atención colapsa antes de L (o el quant la empuja ahí)",
1803
+ "gguf.r.arch": "Arquitectura",
1804
+ "gguf.r.ctx_train": "Contexto entrenado",
1805
+ "gguf.r.horizon_fp16": "Horizonte de atención (fp16)",
1806
+ "gguf.r.quant": "Esquema de quant",
1807
+ "gguf.r.gamma_shift": "γ-shift por quant",
1808
+ "gguf.r.after_quant": "(tras quant)",
1809
+ "gguf.r.eff_horizon": "Horizonte efectivo (cuantizado)",
1810
+ "gguf.r.no_quant_shift": "— precisión completa, sin γ-shift",
1811
+ "gguf.r.note": "Horizonte desde γ_Padé / d_horizon (arquitectura). γ-shift de quant + ΔPPL desde el modelo quant-regime (calibrado a PPL de llama.cpp + papers AWQ/GPTQ). Ambos son estimaciones — verifica los casos límite con un eval real.",
1812
+ "gguf.err.not_gguf": "Ese archivo no es un GGUF válido (magic incorrecto).",
1813
+ "gguf.err.too_large": "La cabecera de metadata supera el límite de descarga — tokenizer inusualmente grande. Prueba otro quant.",
1814
+ "gguf.err.incomplete": "A la metadata GGUF le falta rope_theta o context_length — no se puede calcular el horizonte.",
1815
+ "help.v091.gguf.title": "🧊 Puente de validez GGUF",
1816
+ "help.v091.gguf.body": "La docena de calculadores GGUF/VRAM (NyxKrage, oobabooga, …) leen una cabecera <code>.gguf</code> para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la misma cabecera — vía HTTP Range, sin descargar GB — y responde lo que ellos saltan: <em>¿cabe Y además funciona?</em> Pega un repo GGUF, elige un archivo de quant; el puente extrae <code>rope_theta</code>, <code>context_length</code>, el esquema de quant (de <code>general.file_type</code> o del nombre del archivo), y la geometría de cabezas, luego corre γ_Padé / d_horizon de TAF más el γ-shift de quant consciente de arquitectura. Salida: horizonte de atención efectivo en el contexto entrenado, cuánto erosiona γ el quant (y ΔPPL) para <em>este</em> modelo, y un veredicto. <em>Caso de uso</em>: 'Q4_K_M cabe en 8GB — ¿pero se vuelve tonto pasado 30K?' → ve el horizonte y la penalización γ de Q4 antes de descargar 6 GB.",
1817
  "yarn.title": "🧵 Planificador de extensión de contexto YaRN / RoPE",
1818
  "yarn.tip": "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
1819
  "yarn.desc": "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
 
2979
  "mode_desc.hub": "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
2980
  "modes.yarn": "🧵 Planificateur YaRN",
2981
  "mode_desc.yarn": "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
2982
+ "modes.gguf": "🧊 Pont GGUF",
2983
+ "mode_desc.gguf": "Lit l'en-tête de métadonnées d'un fichier GGUF (rope_theta, context_length, quant) dans votre navigateur et donne un verdict de qualité TAF — la question que les calculateurs de VRAM ignorent : tient ET fonctionne ?",
2984
+ "gguf.title": "🧊 Pont de validité GGUF",
2985
+ "gguf.tip": "<strong>Tenir dans la VRAM ≠ fonctionner</strong>. Les calculateurs GGUF/VRAM lisent les métadonnées d'un modèle pour dire si un quant <em>tient dans le GPU</em>. Ceci lit les MÊMES métadonnées (rope_theta, context_length, schéma de quant, géométrie des têtes) directement depuis l'en-tête <code>.gguf</code> via HTTP Range — sans télécharger des Go — et répond à ce qu'ils n'abordent pas : la qualité d'attention tient-elle vraiment, et de combien le quant l'érode-t-il (γ-shift, ΔPPL) ?",
2986
+ "gguf.desc": "Collez un dépôt GGUF (ex. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), choisissez un fichier de quant, et obtenez un verdict de qualité TAF : l'horizon d'attention effectif du modèle, plus de combien la quantification choisie décale γ pour <em>cette architecture précise</em>. Ne lit que l'en-tête du fichier dans votre navigateur.",
2987
+ "gguf.repo_label": "ID du dépôt GGUF :",
2988
+ "gguf.list_btn": "📂 Lister les fichiers quant",
2989
+ "gguf.file_label": "Fichier quant :",
2990
+ "gguf.target_label": "Contexte cible L (optionnel) :",
2991
+ "gguf.analyze_btn": "🧊 Analyser le GGUF",
2992
+ "gguf.all_btn": "📊 Comparer tous les quants",
2993
+ "gguf.compare_title": "Tous les quants — comparaison de qualité",
2994
+ "gguf.col.verdict": "Verdict",
2995
+ "gguf.col.gamma_at_l": "γ @ L (après quant)",
2996
+ "gguf.need_repo": "Saisissez un id de dépôt GGUF comme 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
2997
+ "gguf.listing": "Listage des fichiers .gguf depuis HF Hub…",
2998
+ "gguf.no_files": "Aucun fichier .gguf trouvé dans ce dépôt.",
2999
+ "gguf.found": "fichiers quant trouvés",
3000
+ "gguf.pick_hint": "choisissez-en un et cliquez Analyser.",
3001
+ "gguf.reading": "Lecture de l'en-tête GGUF via HTTP Range…",
3002
+ "gguf.read_ok": "En-tête analysé",
3003
+ "gguf.verdict.healthy": "SAIN — l'horizon effectif atteint L avec un bon γ après quant",
3004
+ "gguf.verdict.usable_with_care":"UTILISABLE AVEC PRUDENCE — atteint L mais γ est modeste après quant",
3005
+ "gguf.verdict.degrades": "DÉGRADE — l'attention s'effondre avant L (ou le quant l'y pousse)",
3006
+ "gguf.r.arch": "Architecture",
3007
+ "gguf.r.ctx_train": "Contexte d'entraînement",
3008
+ "gguf.r.horizon_fp16": "Horizon d'attention (fp16)",
3009
+ "gguf.r.quant": "Schéma de quant",
3010
+ "gguf.r.gamma_shift": "γ-shift dû au quant",
3011
+ "gguf.r.after_quant": "(après quant)",
3012
+ "gguf.r.eff_horizon": "Horizon effectif (quantifié)",
3013
+ "gguf.r.no_quant_shift": "— pleine précision, pas de γ-shift",
3014
+ "gguf.r.note": "Horizon depuis γ_Padé / d_horizon (architecture). γ-shift de quant + ΔPPL depuis le modèle quant-regime (calibré sur la PPL de llama.cpp + papiers AWQ/GPTQ). Les deux sont des estimations — vérifiez les cas limites avec un éval réel.",
3015
+ "gguf.err.not_gguf": "Ce fichier n'est pas un GGUF valide (mauvais magic).",
3016
+ "gguf.err.too_large": "L'en-tête de métadonnées dépasse la limite de téléchargement — tokenizer inhabituellement grand. Essayez un autre quant.",
3017
+ "gguf.err.incomplete": "Il manque rope_theta ou context_length dans les métadonnées GGUF — impossible de calculer l'horizon.",
3018
+ "help.v091.gguf.title": "🧊 Pont de validité GGUF",
3019
+ "help.v091.gguf.body": "La douzaine de calculateurs GGUF/VRAM (NyxKrage, oobabooga, …) lisent un en-tête <code>.gguf</code> pour dire si un quant <em>tient dans le GPU</em>. Ceci lit le même en-tête — via HTTP Range, sans télécharger des Go — et répond à ce qu'ils sautent : <em>tient-il ET fonctionne-t-il encore ?</em> Collez un dépôt GGUF, choisissez un fichier de quant ; le pont extrait <code>rope_theta</code>, <code>context_length</code>, le schéma de quant (depuis <code>general.file_type</code> ou le nom de fichier) et la géométrie des têtes, puis exécute γ_Padé / d_horizon de TAF plus le γ-shift de quant conscient de l'architecture. Sortie : horizon d'attention effectif au contexte d'entraînement, de combien le quant érode γ (et ΔPPL) pour <em>ce</em> modèle, et un verdict. <em>Cas d'usage</em> : 'Q4_K_M tient dans 8 Go — mais est-il abruti au-delà de 30K ?' → voyez l'horizon et la pénalité γ de Q4 avant de télécharger 6 Go.",
3020
  "yarn.title": "🧵 Planificateur d'extension de contexte YaRN / RoPE",
3021
  "yarn.tip": "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
3022
  "yarn.desc": "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
 
4182
  "mode_desc.hub": "每个 LLM-eval 问题的地图 → tafagent 模式(若覆盖)+ 精选外部工具。找到方案而非重新发明。30+ 问题,7 类别。",
4183
  "modes.yarn": "🧵 YaRN 规划器",
4184
  "mode_desc.yarn": "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决:在目标长度下注意力质量是否真的撑得住。",
4185
+ "modes.gguf": "🧊 GGUF 桥",
4186
+ "mode_desc.gguf": "在浏览器内读取 GGUF 文件的元数据头(rope_theta、context_length、量化),给出 TAF 质量裁决 —— 显存计算器跳过的那个问题:塞得进且还能用吗?",
4187
+ "gguf.title": "🧊 GGUF 有效性桥",
4188
+ "gguf.tip": "<strong>塞进显存 ≠ 能用</strong>。GGUF/显存计算器读取模型元数据来告诉你某量化<em>是否塞得进 GPU</em>。本工具通过 HTTP Range 直接从 <code>.gguf</code> 头读取同样的元数据(rope_theta、context_length、量化方案、注意力头几何)—— 无需下载数 GB —— 并回答它们不答的:注意力质量是否真的撑得住,量化又侵蚀了多少(γ-shift、ΔPPL)?",
4189
+ "gguf.desc": "粘贴一个 GGUF 仓库(如 <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>),选择一个量化文件,获得 TAF 质量裁决:模型的有效注意力视界,以及所选量化对<em>这个具体架构</em>的 γ 位移有多大。只在浏览器内读取文件头。",
4190
+ "gguf.repo_label": "GGUF 仓库 id:",
4191
+ "gguf.list_btn": "📂 列出量化文件",
4192
+ "gguf.file_label": "量化文件:",
4193
+ "gguf.target_label": "目标上下文 L(可选):",
4194
+ "gguf.analyze_btn": "🧊 分析 GGUF",
4195
+ "gguf.all_btn": "📊 比较所有量化",
4196
+ "gguf.compare_title": "所有量化 —— 质量对比",
4197
+ "gguf.col.verdict": "裁决",
4198
+ "gguf.col.gamma_at_l": "γ @ L(量化后)",
4199
+ "gguf.need_repo": "输入 GGUF 仓库 id,如 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
4200
+ "gguf.listing": "正在从 HF Hub 列出 .gguf 文件…",
4201
+ "gguf.no_files": "该仓库中未找到 .gguf 文件。",
4202
+ "gguf.found": "个量化文件已找到",
4203
+ "gguf.pick_hint": "选一个并点击分析。",
4204
+ "gguf.reading": "正在通过 HTTP Range 读取 GGUF 头…",
4205
+ "gguf.read_ok": "头已解析",
4206
+ "gguf.verdict.healthy": "健康 —— 量化后有效视界以良好的 γ 到达 L",
4207
+ "gguf.verdict.usable_with_care":"可用但需谨慎 —— 到达 L,但量化后 γ 偏低",
4208
+ "gguf.verdict.degrades": "退化 —— 注意力在 L 之前崩溃(或被量化推到那里)",
4209
+ "gguf.r.arch": "架构",
4210
+ "gguf.r.ctx_train": "训练上下文",
4211
+ "gguf.r.horizon_fp16": "注意力视界(fp16)",
4212
+ "gguf.r.quant": "量化方案",
4213
+ "gguf.r.gamma_shift": "量化导致的 γ 位移",
4214
+ "gguf.r.after_quant": "(量化后)",
4215
+ "gguf.r.eff_horizon": "有效视界(量化后)",
4216
+ "gguf.r.no_quant_shift": "—— 全精度,无 γ 位移",
4217
+ "gguf.r.note": "视界来自 γ_Padé / d_horizon(架构)。量化 γ 位移 + ΔPPL 来自 quant-regime 模型(以 llama.cpp PPL + AWQ/GPTQ 论文校准)。两者皆为估计 —— 边界情况请用真实评测核实。",
4218
+ "gguf.err.not_gguf": "该文件不是有效的 GGUF(magic 错误)。",
4219
+ "gguf.err.too_large": "元数据头超出获取上限 —— tokenizer 异常大。请换一个量化。",
4220
+ "gguf.err.incomplete": "GGUF 元数据缺少 rope_theta 或 context_length —— 无法计算视界。",
4221
+ "help.v091.gguf.title": "🧊 GGUF 有效性桥",
4222
+ "help.v091.gguf.body": "那一打 GGUF/显存计算器(NyxKrage、oobabooga……)读取 <code>.gguf</code> 头来告诉你某量化<em>是否塞得进 GPU</em>。本工具读取同样的头 —— 通过 HTTP Range,无需下载数 GB —— 并回答它们跳过的:<em>塞得进且还能用吗?</em> 粘贴一个 GGUF 仓库,选择一个量化文件;桥会提取 <code>rope_theta</code>、<code>context_length</code>、量化方案(来自 <code>general.file_type</code> 或文件名)和头几何,然后运行 TAF 的 γ_Padé / d_horizon 加上架构感知的 quant-regime γ 位移。输出:训练上下文处的有效注意力视界、量化对<em>该</em>模型侵蚀 γ(及 ΔPPL)的程度,以及裁决。<em>用例</em>:'Q4_K_M 塞得进 8GB —— 但超过 30K 会变傻吗?' → 在下载 6 GB 之前先看视界和 Q4 的 γ 惩罚。",
4223
  "yarn.title": "🧵 YaRN / RoPE 上下文扩展规划器",
4224
  "yarn.tip": "<strong>配置 + 裁决,不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块,并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制,全在浏览器内运行。",
4225
  "yarn.desc": "想让模型超出其训练上下文运行?输入模型(或其 θ + 训练上下文)和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段(transformers ≥4.43),外加 TAF 裁决:有效注意力视界能否到达 L,还是模型在 d_horizon 之外就开始幻觉?",
js/main.js CHANGED
@@ -39,6 +39,7 @@ import {
39
  loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
40
  } from "./longscore.js";
41
  import { planExtension, suggestRopeType } from "./yarn_planner.js";
 
42
 
43
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
44
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
@@ -233,6 +234,7 @@ document.addEventListener("click", (e) => {
233
  longscore: "longscore-section",
234
  hub: "hub-section",
235
  yarn: "yarn-section",
 
236
  }[targetMode];
237
  if (sectionId) {
238
  const sec = document.getElementById(sectionId);
@@ -257,7 +259,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
257
  "diagnose-section", "phase-section", "unmask-section",
258
  "template-section", "arena-section", "contam-section",
259
  "quant-section", "drift-section", "niah-section",
260
- "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section"].forEach(id => {
261
  const el = $(id);
262
  if (el) el.style.display = "none";
263
  });
@@ -277,6 +279,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
277
  longscore: "longscore-section",
278
  hub: "hub-section",
279
  yarn: "yarn-section",
 
280
  };
281
  const sectionId = sectionMap[mode];
282
  if (sectionId) $(sectionId).style.display = "";
@@ -291,6 +294,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
291
  if (mode === "longscore") initLongscore();
292
  if (mode === "hub") initHub();
293
  if (mode === "yarn") initYarn();
 
294
  });
295
  });
296
 
@@ -4661,9 +4665,20 @@ function initYarn() {
4661
  });
4662
  }
4663
 
 
 
4664
  function _yarnFmtK(n) {
4665
  if (n == null || !Number.isFinite(n)) return "—";
4666
- if (n >= 1000) return (n / 1000).toFixed(n >= 10000 ? 0 : 1) + "K";
 
 
 
 
 
 
 
 
 
4667
  return String(Math.round(n));
4668
  }
4669
  function _yarnFmtG(g) {
@@ -4720,7 +4735,7 @@ function renderYarnPlan(p) {
4720
  <tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
4721
  <tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
4722
  <tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
4723
- <tr><td style="${td}">θ_eff</td><td>${_yarnFmtK(p.thetaEff)}${p.thetaEff > p.theta ? ` (↑ ${t("yarn.r.from")} ${_yarnFmtK(p.theta)})` : ""}</td></tr>
4724
  <tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ &lt; L"}</td></tr>
4725
  </table>
4726
  <h3>${t("yarn.r.snippet")}</h3>
@@ -4736,6 +4751,196 @@ function renderYarnPlan(p) {
4736
  });
4737
  }
4738
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4739
  // ════════════════════════════════════════════════════════════════════
4740
  // Bootstrap
4741
  // ════════════════════════════════════════════════════════════════════
 
39
  loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
40
  } from "./longscore.js";
41
  import { planExtension, suggestRopeType } from "./yarn_planner.js";
42
+ import { listGgufFiles, fetchGgufMetadata, ggufToConfig, quantFromFilename, analyzeGguf } from "./gguf_bridge.js";
43
 
44
  // Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
45
  // Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
 
234
  longscore: "longscore-section",
235
  hub: "hub-section",
236
  yarn: "yarn-section",
237
+ gguf: "gguf-section",
238
  }[targetMode];
239
  if (sectionId) {
240
  const sec = document.getElementById(sectionId);
 
259
  "diagnose-section", "phase-section", "unmask-section",
260
  "template-section", "arena-section", "contam-section",
261
  "quant-section", "drift-section", "niah-section",
262
+ "saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section", "gguf-section"].forEach(id => {
263
  const el = $(id);
264
  if (el) el.style.display = "none";
265
  });
 
279
  longscore: "longscore-section",
280
  hub: "hub-section",
281
  yarn: "yarn-section",
282
+ gguf: "gguf-section",
283
  };
284
  const sectionId = sectionMap[mode];
285
  if (sectionId) $(sectionId).style.display = "";
 
294
  if (mode === "longscore") initLongscore();
295
  if (mode === "hub") initHub();
296
  if (mode === "yarn") initYarn();
297
+ if (mode === "gguf") initGguf();
298
  });
299
  });
300
 
 
4665
  });
4666
  }
4667
 
4668
+ // Context / horizon lengths: binary-K so 32768→32K, 131072→128K, 8192→8K
4669
+ // (the convention everyone uses for context windows), not decimal-K (→33K).
4670
  function _yarnFmtK(n) {
4671
  if (n == null || !Number.isFinite(n)) return "—";
4672
+ if (n >= 1048576) return (n / 1048576).toFixed(1) + "M";
4673
+ if (n >= 1024) return Math.round(n / 1024) + "K";
4674
+ return String(Math.round(n));
4675
+ }
4676
+ // RoPE θ is an arbitrary base, not a power of two → decimal M/K reads naturally
4677
+ // (1000000→1M, 500000→500K, 40000→40K).
4678
+ function _thetaFmt(n) {
4679
+ if (n == null || !Number.isFinite(n)) return "—";
4680
+ if (n >= 1e6) return (n / 1e6).toFixed(n % 1e6 === 0 ? 0 : 1) + "M";
4681
+ if (n >= 1000) return (n / 1000).toFixed(n % 1000 === 0 ? 0 : 1) + "K";
4682
  return String(Math.round(n));
4683
  }
4684
  function _yarnFmtG(g) {
 
4735
  <tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
4736
  <tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
4737
  <tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
4738
+ <tr><td style="${td}">θ_eff</td><td>${_thetaFmt(p.thetaEff)}${p.thetaEff > p.theta ? ` (↑ ${t("yarn.r.from")} ${_thetaFmt(p.theta)})` : ""}</td></tr>
4739
  <tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ &lt; L"}</td></tr>
4740
  </table>
4741
  <h3>${t("yarn.r.snippet")}</h3>
 
4751
  });
4752
  }
4753
 
4754
+ // ════════════════════════════════════════════════════════════════════
4755
+ // 🧊 GGUF Validity Bridge (v0.9.1)
4756
+ // ════════════════════════════════════════════════════════════════════
4757
+ let _ggufWired = false;
4758
+ let _ggufFiles = [];
4759
+ let _ggufCfgCache = {}; // "repo|file" → ggufToConfig result (geometry is shared across quants)
4760
+
4761
+ // Parse a .gguf header once and cache. The architecture/θ/context/head geometry
4762
+ // is identical across every quant of the same model — only the quant scheme
4763
+ // differs — so one parsed file is enough to score the whole repo.
4764
+ async function ggufGetCfg(repo, file) {
4765
+ const key = `${repo}|${file}`;
4766
+ if (_ggufCfgCache[key]) return _ggufCfgCache[key];
4767
+ const url = `https://huggingface.co/${repo}/resolve/main/${file}`;
4768
+ const meta = await fetchGgufMetadata(url);
4769
+ const cfg = ggufToConfig(meta);
4770
+ if (!cfg.quant_scheme) {
4771
+ const q = quantFromFilename(file);
4772
+ cfg.quant_label = cfg.quant_label || q.label;
4773
+ cfg.quant_scheme = q.scheme;
4774
+ }
4775
+ cfg.__bytesRead = meta.bytesRead;
4776
+ _ggufCfgCache[key] = cfg;
4777
+ return cfg;
4778
+ }
4779
+
4780
+ function initGguf() {
4781
+ if (_ggufWired) return;
4782
+ _ggufWired = true;
4783
+
4784
+ const listBtn = $("gguf-list-btn");
4785
+ const analyzeBtn = $("gguf-analyze-btn");
4786
+ const allBtn = $("gguf-all-btn");
4787
+ const fileSel = $("gguf-file");
4788
+
4789
+ listBtn?.addEventListener("click", async () => {
4790
+ const repo = ($("gguf-repo").value || "").trim();
4791
+ if (!repo) { $("gguf-status").textContent = "⚠ " + t("gguf.need_repo"); return; }
4792
+ $("gguf-status").textContent = "⏳ " + t("gguf.listing");
4793
+ listBtn.disabled = true;
4794
+ state.lastModelId = repo;
4795
+ try {
4796
+ const files = await listGgufFiles(repo);
4797
+ if (!files.length) { $("gguf-status").textContent = "⚠ " + t("gguf.no_files"); fileSel.disabled = true; analyzeBtn.disabled = true; return; }
4798
+ fileSel.innerHTML = files.map(f => `<option value="${escapeHtml(f)}">${escapeHtml(f)}</option>`).join("");
4799
+ // Default-select a Q4_K_M (the community sweet spot) if present.
4800
+ const def = files.find(f => /q4_k_m/i.test(f)) || files[0];
4801
+ fileSel.value = def;
4802
+ fileSel.disabled = false;
4803
+ analyzeBtn.disabled = false;
4804
+ $("gguf-all-btn").disabled = false;
4805
+ _ggufFiles = files;
4806
+ $("gguf-status").innerHTML = `✅ ${files.length} ${t("gguf.found")} — ${t("gguf.pick_hint")}`;
4807
+ } catch (err) {
4808
+ $("gguf-status").textContent = `❌ ${err.message}`;
4809
+ } finally {
4810
+ listBtn.disabled = false;
4811
+ }
4812
+ });
4813
+
4814
+ analyzeBtn?.addEventListener("click", async () => {
4815
+ const repo = ($("gguf-repo").value || "").trim();
4816
+ const file = fileSel.value;
4817
+ if (!repo || !file) return;
4818
+ $("gguf-status").textContent = "⏳ " + t("gguf.reading");
4819
+ analyzeBtn.disabled = true;
4820
+ try {
4821
+ const cfg = await ggufGetCfg(repo, file);
4822
+ const target = parseFloat($("gguf-target").value) || null;
4823
+ const result = analyzeGguf(cfg, target);
4824
+ $("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
4825
+ renderGgufResult(cfg, result);
4826
+ } catch (err) {
4827
+ $("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
4828
+ } finally {
4829
+ analyzeBtn.disabled = false;
4830
+ }
4831
+ });
4832
+
4833
+ allBtn?.addEventListener("click", async () => {
4834
+ const repo = ($("gguf-repo").value || "").trim();
4835
+ const file = fileSel.value;
4836
+ if (!repo || !file) return;
4837
+ $("gguf-status").textContent = "⏳ " + t("gguf.reading");
4838
+ allBtn.disabled = true; analyzeBtn.disabled = true;
4839
+ try {
4840
+ // One header parse gives the shared geometry; score every quant from it.
4841
+ const cfg = await ggufGetCfg(repo, file);
4842
+ const target = parseFloat($("gguf-target").value) || null;
4843
+ // Dedupe repo files to one row per quant label (drop shard suffixes).
4844
+ const seen = new Set();
4845
+ const rows = [];
4846
+ for (const f of _ggufFiles) {
4847
+ const q = quantFromFilename(f);
4848
+ if (q.label === "?" || seen.has(q.label)) continue;
4849
+ seen.add(q.label);
4850
+ const res = analyzeGguf({ ...cfg, quant_label: q.label, quant_scheme: q.scheme }, target);
4851
+ rows.push({ label: q.label, scheme: q.scheme, res });
4852
+ }
4853
+ // Best precision first: lowest γ-shift (baseline F16 = 0) at the top.
4854
+ rows.sort((a, b) => (a.res.quant?.gamma_shift ?? 0) - (b.res.quant?.gamma_shift ?? 0));
4855
+ $("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
4856
+ renderGgufComparison(cfg, rows);
4857
+ } catch (err) {
4858
+ $("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
4859
+ } finally {
4860
+ allBtn.disabled = false; analyzeBtn.disabled = false;
4861
+ }
4862
+ });
4863
+ }
4864
+
4865
+ function ggufErrMsg(err) {
4866
+ return ({
4867
+ not_a_gguf_file: t("gguf.err.not_gguf"),
4868
+ gguf_metadata_too_large: t("gguf.err.too_large"),
4869
+ })[err.message] || err.message;
4870
+ }
4871
+
4872
+ function renderGgufResult(cfg, r) {
4873
+ const out = $("gguf-output");
4874
+ if (!out) return;
4875
+ out.style.display = "";
4876
+
4877
+ if (r.verdict === "incomplete") {
4878
+ out.innerHTML = `<div class="gc-validity-warning">⚠ ${t("gguf.err.incomplete")}</div>`;
4879
+ return;
4880
+ }
4881
+
4882
+ const meta = ({
4883
+ healthy: { emoji: "✅", cls: "v-yes" },
4884
+ usable_with_care: { emoji: "⚠️", cls: "v-deg" },
4885
+ degrades: { emoji: "🚨", cls: "v-no" },
4886
+ })[r.verdict] || { emoji: "❓", cls: "v-deg" };
4887
+
4888
+ const td = "padding:3px 12px 3px 0;";
4889
+ const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
4890
+ ? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
4891
+
4892
+ // Quant block (may be null for F16/F32 files).
4893
+ let quantHtml = "";
4894
+ if (r.quant) {
4895
+ const regimeEmoji = ({ safe: "✅", mild: "🟡", significant: "🟠", cliff: "🚨" })[r.quant.regime] || "";
4896
+ const dp = r.quant.delta_ppl;
4897
+ quantHtml = `
4898
+ <tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "?"}</code></td></tr>
4899
+ <tr><td style="${td}">${t("gguf.r.gamma_shift")}</td><td>−${_yarnFmtG(r.quant.gamma_shift)} ${regimeEmoji} <span class="subtle">${t("quant.regime." + r.quant.regime) || r.quant.regime}</span></td></tr>
4900
+ <tr><td style="${td}">ΔPPL</td><td>≈ +${dp.mid} <span class="subtle">(${dp.low}–${dp.high})</span></td></tr>`;
4901
+ } else {
4902
+ quantHtml = `<tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "F16/F32"}</code> <span class="subtle">${t("gguf.r.no_quant_shift")}</span></td></tr>`;
4903
+ }
4904
+
4905
+ out.innerHTML = `
4906
+ <p><span class="verdict-badge ${meta.cls}">${meta.emoji} ${t("gguf.verdict." + r.verdict)}</span></p>
4907
+ <table style="border-collapse:collapse;font-size:0.95em;margin:0.5em 0;">
4908
+ <tr><td style="${td}">${t("gguf.r.arch")}</td><td><code>${escapeHtml(r.arch)}</code> · ${gqa} · θ=${_thetaFmt(r.theta)}</td></tr>
4909
+ <tr><td style="${td}">${t("gguf.r.ctx_train")}</td><td>${_yarnFmtK(r.nCtx)}</td></tr>
4910
+ <tr><td style="${td}">${t("gguf.r.horizon_fp16")}</td><td>${_yarnFmtK(r.dHoriz)} <span class="subtle">(γ=${_yarnFmtG(r.gammaTrain)})</span></td></tr>
4911
+ ${quantHtml}
4912
+ <tr><td style="${td}"><strong>γ @ L=${_yarnFmtK(r.L)}</strong> ${t("gguf.r.after_quant")}</td><td><strong>${_yarnFmtG(r.gammaQuant)}</strong> <span class="subtle">(fp16: ${_yarnFmtG(r.gammaAtL)})</span></td></tr>
4913
+ </table>
4914
+ <p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
4915
+ }
4916
+
4917
+ function renderGgufComparison(cfg, rows) {
4918
+ const out = $("gguf-output");
4919
+ if (!out) return;
4920
+ out.style.display = "";
4921
+ const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
4922
+ ? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
4923
+ // Short verdict label = the word before the em-dash of the full verdict string
4924
+ // (works in every language: "HEALTHY — …", "SANO — …", "健康 —— …").
4925
+ const short = v => (t("gguf.verdict." + v) || v).split(/——|—| - /)[0].trim();
4926
+ const emo = v => ({ healthy: "✅", usable_with_care: "⚠️", degrades: "🚨" })[v] || "❓";
4927
+ const td = "padding:3px 14px 3px 0;";
4928
+ const head = `<tr style="text-align:left;border-bottom:1px solid var(--border);">
4929
+ <th style="${td}">${t("gguf.r.quant")}</th><th style="${td}">${t("gguf.r.gamma_shift")}</th>
4930
+ <th style="${td}">${t("gguf.col.gamma_at_l")}</th><th style="${td}">${t("gguf.col.verdict")}</th></tr>`;
4931
+ const body = rows.map(({ label, res }) => {
4932
+ const shift = res.quant ? "−" + _yarnFmtG(res.quant.gamma_shift) : "—";
4933
+ return `<tr><td style="${td}"><code>${escapeHtml(label)}</code></td><td style="${td}">${shift}</td>
4934
+ <td style="${td}">${_yarnFmtG(res.gammaQuant)}</td>
4935
+ <td style="${td}">${emo(res.verdict)} ${short(res.verdict)}</td></tr>`;
4936
+ }).join("");
4937
+ // d_horizon is θ-set → identical for every quant; show it once in the header line.
4938
+ out.innerHTML = `<h3>${t("gguf.compare_title")}</h3>
4939
+ <p class="subtle">${escapeHtml(cfg.architecture)} · ${gqa} · θ=${_thetaFmt(cfg.rope_theta)} · ctx ${_yarnFmtK(cfg.context_length)} · horizon ${_yarnFmtK(rows[0]?.res.dHoriz)} · L=${_yarnFmtK(rows[0]?.res.L)}</p>
4940
+ <table style="border-collapse:collapse;font-size:0.93em;">${head}${body}</table>
4941
+ <p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
4942
+ }
4943
+
4944
  // ════════════════════════════════════════════════════════════════════
4945
  // Bootstrap
4946
  // ════════════════════════════════════════════════════════════════════
test_gguf.mjs ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { chromium } from "playwright";
2
+ const BASE = "http://127.0.0.1:8000/index.html";
3
+ const b = await chromium.launch({ headless: true });
4
+ const p = await (await b.newContext()).newPage();
5
+ const errors = [];
6
+ const benign = s => /Failed to load resource.*40\d|status of 40\d/.test(s);
7
+ p.on("console", m => { if (m.type()==="error" && !benign(m.text())) errors.push(`[err] ${m.text()}`); });
8
+ p.on("pageerror", e => errors.push(`[pageerror] ${e.message}`));
9
+ const log = s => process.stdout.write(s+"\n");
10
+ let pass=0, fail=0;
11
+ const check=(n,c,x="")=>{ log(`${c?" OK ":" FAIL"} ${n} ${x}`); c?pass++:fail++; };
12
+
13
+ await p.goto(BASE,{waitUntil:"domcontentloaded",timeout:90000});
14
+ await p.waitForTimeout(2500);
15
+ await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
16
+ check("module loads, 0 errors", errors.length===0, `(errors=${errors.length})`);
17
+
18
+ await p.click('[data-mode-link="gguf"]',{timeout:5000}); await p.waitForTimeout(500);
19
+ const secVis = await p.evaluate(()=>{const s=document.querySelector("#gguf-section");return s&&getComputedStyle(s).display!=="none";});
20
+ check("gguf-section visible after tile click", secVis);
21
+
22
+ log("\n── List quant files (real repo) ──");
23
+ await p.fill("#gguf-repo","Qwen/Qwen2.5-0.5B-Instruct-GGUF");
24
+ await p.click("#gguf-list-btn");
25
+ await p.waitForTimeout(4000);
26
+ const listed = await p.evaluate(()=>{
27
+ const sel=document.querySelector("#gguf-file");
28
+ return { count:sel.options.length, selected:sel.value, disabled:sel.disabled,
29
+ analyzeEnabled:!document.querySelector("#gguf-analyze-btn").disabled,
30
+ status:document.querySelector("#gguf-status").innerText.slice(0,60) };
31
+ });
32
+ check("files listed in dropdown", listed.count>0, `(${listed.count} files)`);
33
+ check("Q4_K_M auto-selected", /q4_k_m/i.test(listed.selected), listed.selected);
34
+ check("analyze button enabled", listed.analyzeEnabled);
35
+
36
+ log("\n── Analyze GGUF (parse header + verdict) ──");
37
+ await p.click("#gguf-analyze-btn");
38
+ await p.waitForTimeout(8000); // range fetch + parse
39
+ const r = await p.evaluate(()=>{
40
+ const o=document.querySelector("#gguf-output");
41
+ return { vis:getComputedStyle(o).display!=="none",
42
+ verdict:o.querySelector(".verdict-badge")?.innerText?.trim()||"",
43
+ text:o.innerText,
44
+ status:document.querySelector("#gguf-status").innerText };
45
+ });
46
+ check("output rendered", r.vis && r.text.length>50);
47
+ check("verdict present", r.verdict.length>3, r.verdict);
48
+ check("shows architecture qwen2", /qwen2/.test(r.text));
49
+ check("shows trained context 32K", /32K|32768/.test(r.text), (r.text.match(/Trained context[^\n]*\n?\s*[\w.]+/)||[""])[0].slice(0,40));
50
+ check("shows quant Q4_K_M", /Q4_K_M/i.test(r.text));
51
+ check("shows γ-shift from quant", /γ-shift|shift/i.test(r.text));
52
+ check("shows ΔPPL", /ΔPPL|PPL/.test(r.text));
53
+ check("header parsed status (MB)", /MB header|parsed|analizada|analysé|已解析/i.test(r.status), r.status.slice(0,50));
54
+
55
+ log("\n── Target L override ──");
56
+ await p.fill("#gguf-target","131072");
57
+ await p.click("#gguf-analyze-btn");
58
+ await p.waitForTimeout(7000);
59
+ const r2 = await p.evaluate(()=>document.querySelector("#gguf-output .verdict-badge")?.innerText?.trim());
60
+ check("re-analyze with L=131072", r2.length>3, r2);
61
+
62
+ log("\n── Compare all quants (one header parse → full table) ──");
63
+ await p.click("#gguf-all-btn");
64
+ await p.waitForTimeout(7000);
65
+ const cmp = await p.evaluate(()=>{
66
+ const o=document.querySelector("#gguf-output");
67
+ const rows=[...o.querySelectorAll("table tr")];
68
+ const dataRows=rows.slice(1); // minus header
69
+ return { title:o.querySelector("h3")?.innerText,
70
+ rowCount:dataRows.length,
71
+ quants:dataRows.map(r=>r.querySelector("code")?.innerText).filter(Boolean),
72
+ hasShift:/−0\.|—/.test(o.innerText),
73
+ hasVerdictCol:rows[0]?.innerText?.includes("Verdict") };
74
+ });
75
+ check("comparison table rendered", cmp.rowCount>=3, `(${cmp.rowCount} rows)`);
76
+ check("lists multiple quant labels", cmp.quants.length>=3, cmp.quants.join(", "));
77
+ check("has verdict column", cmp.hasVerdictCol, cmp.title);
78
+ check("rows sorted best→worst (Q8 before Q2)", (()=>{
79
+ const i8=cmp.quants.findIndex(q=>/Q8/.test(q)), i2=cmp.quants.findIndex(q=>/Q2/.test(q));
80
+ return i8<0||i2<0||i8<i2;})(), cmp.quants.join(" > "));
81
+ // Verdicts must vary across quants (regression guard: a hard d_horizon gate
82
+ // once forced every row to DEGRADES even when γ@L was healthy).
83
+ const verdicts = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>r.lastElementChild?.innerText?.trim()));
84
+ check("verdicts vary across quants (not all identical)", new Set(verdicts).size>=2, verdicts.join(" | "));
85
+ // γ@L must DECREASE for worse quants (Q8 γ@L > Q2 γ@L).
86
+ const gammas = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>parseFloat(r.children[2]?.innerText)));
87
+ check("γ@L decreases for worse quant", gammas[0] > gammas[gammas.length-1], `${gammas[0]} → ${gammas[gammas.length-1]}`);
88
+
89
+ log("\n── 4-language verdict ──");
90
+ for (const lang of ["es","fr","zh","en"]) {
91
+ await p.click(`.lang-btn[data-lang="${lang}"]`); await p.waitForTimeout(300);
92
+ const label = await p.evaluate(()=>document.querySelector('.mode-btn[data-mode="gguf"]')?.textContent?.trim());
93
+ check(`${lang}: tab label localized`, label && label.length>3, label);
94
+ }
95
+
96
+ log("\n── Error path: bad repo ──");
97
+ await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
98
+ await p.fill("#gguf-repo","this/definitely-not-a-real-repo-xyz123");
99
+ await p.click("#gguf-list-btn");
100
+ await p.waitForTimeout(3000);
101
+ const errStatus = await p.evaluate(()=>document.querySelector("#gguf-status").innerText);
102
+ check("bad repo → error message", /❌|not found|HTTP/i.test(errStatus), errStatus.slice(0,50));
103
+
104
+ log(`\n=== ${pass} passed, ${fail} failed · JS errors: ${errors.length} ===`);
105
+ errors.slice(0,10).forEach(e=>log(e));
106
+ await b.close();
107
+ process.exit(fail>0?1:0);