Spaces:
Running
v0.9.1: GGUF Validity Bridge mode + binary header parser
Browse filesReads a .gguf file's metadata header straight from HF Hub via HTTP Range
(no multi-GB download) and answers what the dozen GGUF/VRAM calculators
skip: fits in VRAM AND still works?
- js/gguf_bridge.js: incremental Range-fetch GGUF v2/v3 parser (magic, KV
block, arrays skipped by byte-length so the tokenizer doesn't blow the
buffer). ggufToConfig maps GGUF metadata → HF-style config; quant scheme
from general.file_type with filename backstop. analyzeGguf cross-runs
γ_Padé / d_horizon (architecture) with the quant-regime γ-shift.
- "Compare all quants": one header parse → scores every quant in the repo
(geometry is shared; only the scheme differs), sorted best→worst as a
table. γ@L after quant is the comparison axis — it degrades monotonically;
d_horizon is NOT recomputed from a quant-shifted γ (that inverts the
formula). Verdict driven by γ@L + quant regime, not a hard d_horizon gate
(which understates reach for high-θ models like Qwen).
- index.html: tab + tile + #gguf-section + help v0.9.1 entry.
- main.js: import, wiring, cached header parse, single + comparison renders.
Context/horizon now formatted binary-K (32768→32K, not 33K); θ decimal M/K.
- i18n.js: full EN/ES/FR/ZH for all gguf.* keys.
Test (test_gguf.mjs): 25/25 — list/parse real GGUF (Qwen2.5 q4_k_m, 6MB
header), verdict, compare-all table, monotonic γ@L, verdict variety, 4
languages, error paths. 24 modes total, 0 JS errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- index.html +43 -0
- js/gguf_bridge.js +245 -0
- js/i18n.js +152 -0
- js/main.js +208 -3
- test_gguf.mjs +107 -0
|
@@ -246,6 +246,9 @@
|
|
| 246 |
<p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
|
| 247 |
<p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
|
| 248 |
|
|
|
|
|
|
|
|
|
|
| 249 |
<h3 data-i18n="help.audit.title">The audit chain</h3>
|
| 250 |
<p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
|
| 251 |
output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
|
|
@@ -408,6 +411,7 @@
|
|
| 408 |
<button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
|
| 409 |
<button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
|
| 410 |
<button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
|
|
|
|
| 411 |
<button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
|
| 412 |
</div>
|
| 413 |
</div>
|
|
@@ -503,6 +507,7 @@
|
|
| 503 |
<button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
|
| 504 |
<button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
|
| 505 |
<button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
|
|
|
|
| 506 |
</div>
|
| 507 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
| 508 |
<strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
|
|
@@ -1290,6 +1295,44 @@
|
|
| 1290 |
<div id="yarn-output" style="display:none; margin-top:1em;"></div>
|
| 1291 |
</section>
|
| 1292 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1293 |
<!-- Recipe selector (mode=recipe) -->
|
| 1294 |
<section id="recipe-section" style="display:none;">
|
| 1295 |
<h2 data-i18n="recipe.title">📋 Recipe</h2>
|
|
|
|
| 246 |
<p><strong data-i18n="help.v09.yarn.title">🧵 YaRN / RoPE Context-Extension Planner</strong></p>
|
| 247 |
<p data-i18n="help.v09.yarn.body">The dozen GGUF/VRAM calculators on HF (NyxKrage, oobabooga, DavidAU, …) all answer the same question: <em>does context length L fit in my GPU?</em> None answer the harder one: <em>does L fit AND still work?</em> Enter a model id (or its θ + trained context) and a target length L. The planner computes the extension factor, emits the exact <code>rope_scaling</code> block for transformers ≥4.43 (<code>yarn</code> / <code>linear</code> / <code>dynamic</code> / <code>llama3</code>, with paper-default β ramps), then runs TAF's γ_Padé / d_horizon math: γ with no extension (the problem), γ after the chosen method (the fix), the effective attention horizon, and a verdict — HEALTHY / USABLE-WITH-CARE / NEEDS-FINETUNE / DEGRADES. It flags the θ_eff≈θ·factor estimate and the >4× fine-tune requirement honestly. <em>Use case</em>: 'I want Mistral-7B (θ=10k, 8k trained) at 32k' → see γ collapse from naive use, YaRN partially recover it, and get the exact config to paste. Or 'Qwen2.5 at 128k' → discover its θ=1e6 already covers it, no aggressive scaling needed.</p>
|
| 248 |
|
| 249 |
+
<p><strong data-i18n="help.v091.gguf.title">🧊 GGUF Validity Bridge</strong></p>
|
| 250 |
+
<p data-i18n="help.v091.gguf.body">The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict — HEALTHY / USABLE-WITH-CARE / DEGRADES. <em>Use case</em>: 'unsloth/Qwen3.5-9B-GGUF Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.</p>
|
| 251 |
+
|
| 252 |
<h3 data-i18n="help.audit.title">The audit chain</h3>
|
| 253 |
<p data-i18n="help.audit.body">Every result shows the full <strong>Computation Chain</strong> — each formula step with its inputs,
|
| 254 |
output, and interpretation. Click any step to expand. Cite section numbers (§26.1, §19.1, etc.) refer
|
|
|
|
| 411 |
<button data-mode-link="longscore" data-i18n="modes.longscore">🎯 LongScore</button>
|
| 412 |
<button data-mode-link="quant" data-i18n="modes.quant">⚖️ Quant</button>
|
| 413 |
<button data-mode-link="yarn" data-i18n="modes.yarn">🧵 YaRN Planner</button>
|
| 414 |
+
<button data-mode-link="gguf" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
|
| 415 |
<button data-mode-link="inspector" data-i18n="modes.inspector">🔍 Inspect config</button>
|
| 416 |
</div>
|
| 417 |
</div>
|
|
|
|
| 507 |
<button class="mode-btn" data-mode="longscore" role="tab" aria-selected="false" data-i18n="modes.longscore">🎯 LongScore</button>
|
| 508 |
<button class="mode-btn" data-mode="hub" role="tab" aria-selected="false" data-i18n="modes.hub">🧭 Solutions</button>
|
| 509 |
<button class="mode-btn" data-mode="yarn" role="tab" aria-selected="false" data-i18n="modes.yarn">🧵 YaRN Planner</button>
|
| 510 |
+
<button class="mode-btn" data-mode="gguf" role="tab" aria-selected="false" data-i18n="modes.gguf">🧊 GGUF Bridge</button>
|
| 511 |
</div>
|
| 512 |
<p id="mode-desc" class="recipe-desc" data-i18n="modes.desc">
|
| 513 |
<strong>Quickest start</strong>: paste any HuggingFace model id (e.g. <code>meta-llama/Meta-Llama-3-8B</code>),
|
|
|
|
| 1295 |
<div id="yarn-output" style="display:none; margin-top:1em;"></div>
|
| 1296 |
</section>
|
| 1297 |
|
| 1298 |
+
<!-- GGUF Validity Bridge (mode=gguf) -->
|
| 1299 |
+
<section id="gguf-section" style="display:none;">
|
| 1300 |
+
<h2><span data-i18n="gguf.title">🧊 GGUF Validity Bridge</span>
|
| 1301 |
+
<span class="info"><span class="tooltip" data-i18n="gguf.tip">
|
| 1302 |
+
<strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to
|
| 1303 |
+
tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta,
|
| 1304 |
+
context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via
|
| 1305 |
+
HTTP Range — no multi-GB download — and answers the question they don't: does attention
|
| 1306 |
+
quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?
|
| 1307 |
+
</span></span>
|
| 1308 |
+
</h2>
|
| 1309 |
+
<p class="recipe-desc" data-i18n="gguf.desc">
|
| 1310 |
+
Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a
|
| 1311 |
+
TAF quality verdict: the model's effective attention horizon, plus how much the chosen
|
| 1312 |
+
quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your
|
| 1313 |
+
browser.
|
| 1314 |
+
</p>
|
| 1315 |
+
|
| 1316 |
+
<div class="form-row">
|
| 1317 |
+
<label for="gguf-repo" data-i18n="gguf.repo_label">GGUF repo id:</label>
|
| 1318 |
+
<input type="text" id="gguf-repo" placeholder="Qwen/Qwen2.5-7B-Instruct-GGUF">
|
| 1319 |
+
<button id="gguf-list-btn" class="secondary" data-i18n="gguf.list_btn">📂 List quant files</button>
|
| 1320 |
+
</div>
|
| 1321 |
+
<span id="gguf-status" class="subtle"></span>
|
| 1322 |
+
|
| 1323 |
+
<div class="form-row">
|
| 1324 |
+
<label for="gguf-file" data-i18n="gguf.file_label">Quant file:</label>
|
| 1325 |
+
<select id="gguf-file" disabled></select>
|
| 1326 |
+
</div>
|
| 1327 |
+
<div class="form-row">
|
| 1328 |
+
<label for="gguf-target" data-i18n="gguf.target_label">Target context L (optional):</label>
|
| 1329 |
+
<input type="number" id="gguf-target" placeholder="(defaults to trained context)" min="256">
|
| 1330 |
+
</div>
|
| 1331 |
+
<button id="gguf-analyze-btn" disabled data-i18n="gguf.analyze_btn">🧊 Analyze GGUF</button>
|
| 1332 |
+
<button id="gguf-all-btn" class="secondary" disabled data-i18n="gguf.all_btn">📊 Compare all quants</button>
|
| 1333 |
+
<div id="gguf-output" style="display:none; margin-top:1em;"></div>
|
| 1334 |
+
</section>
|
| 1335 |
+
|
| 1336 |
<!-- Recipe selector (mode=recipe) -->
|
| 1337 |
<section id="recipe-section" style="display:none;">
|
| 1338 |
<h2 data-i18n="recipe.title">📋 Recipe</h2>
|
|
@@ -0,0 +1,245 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
// GGUF Validity Bridge (v0.9.1 anti-bullshit pack)
|
| 2 |
+
//
|
| 3 |
+
// The dozen GGUF/VRAM calculators on HF answer "does this quant fit in my GPU?".
|
| 4 |
+
// None answer "does it fit AND still work?". This reads a .gguf file's metadata
|
| 5 |
+
// header directly in the browser (HTTP Range — no full multi-GB download), pulls
|
| 6 |
+
// rope_theta + context_length + quant scheme + head geometry, then runs TAF's
|
| 7 |
+
// γ_Padé / d_horizon + the quant-regime γ-shift to emit a quality verdict:
|
| 8 |
+
// "fits in VRAM but attention collapses past d_horizon, and Q4 worsens γ by …".
|
| 9 |
+
//
|
| 10 |
+
// Parser logic is pure; the network fetch is unavoidable I/O. main.js renders.
|
| 11 |
+
|
| 12 |
+
import { gammaPade } from "./gamma_check.js";
|
| 13 |
+
import { dHorizon } from "./yarn_planner.js";
|
| 14 |
+
import { predictQuantShift } from "./quant_regime.js";
|
| 15 |
+
|
| 16 |
+
// ── GGUF metadata value types (spec v2/v3) ──
|
| 17 |
+
const GT = { U8:0, I8:1, U16:2, I16:3, U32:4, I32:5, F32:6, BOOL:7, STR:8, ARR:9, U64:10, I64:11, F64:12 };
|
| 18 |
+
const FIXED_SIZE = { 0:1, 1:1, 2:2, 3:2, 4:4, 5:4, 6:4, 7:1, 10:8, 11:8, 12:8 };
|
| 19 |
+
|
| 20 |
+
// general.file_type enum (llama_ftype) → human label + the quant_regime scheme id
|
| 21 |
+
// we feed to predictQuantShift. Only the common ones; filename parsing backstops.
|
| 22 |
+
const FTYPE = {
|
| 23 |
+
0: ["F32", null],
|
| 24 |
+
1: ["F16", null],
|
| 25 |
+
2: ["Q4_0", "gguf_q4_km"],
|
| 26 |
+
3: ["Q4_1", "gguf_q4_km"],
|
| 27 |
+
7: ["Q8_0", "gguf_q8_0"],
|
| 28 |
+
8: ["Q5_0", "gguf_q5_km"],
|
| 29 |
+
9: ["Q5_1", "gguf_q5_km"],
|
| 30 |
+
10: ["Q2_K", "gguf_q2_k"],
|
| 31 |
+
11: ["Q3_K_S", "gguf_q3_km"],
|
| 32 |
+
12: ["Q3_K_M", "gguf_q3_km"],
|
| 33 |
+
13: ["Q3_K_L", "gguf_q3_km"],
|
| 34 |
+
14: ["Q4_K_S", "gguf_q4_km"],
|
| 35 |
+
15: ["Q4_K_M", "gguf_q4_km"],
|
| 36 |
+
16: ["Q5_K_S", "gguf_q5_km"],
|
| 37 |
+
17: ["Q5_K_M", "gguf_q5_km"],
|
| 38 |
+
18: ["Q6_K", "gguf_q8_0"],
|
| 39 |
+
};
|
| 40 |
+
|
| 41 |
+
// Filename → (label, scheme) backstop when general.file_type is absent/ambiguous.
|
| 42 |
+
export function quantFromFilename(name) {
|
| 43 |
+
const n = (name || "").toUpperCase();
|
| 44 |
+
const pairs = [
|
| 45 |
+
["Q2_K", "gguf_q2_k"], ["Q3_K", "gguf_q3_km"], ["Q4_K", "gguf_q4_km"],
|
| 46 |
+
["Q5_K", "gguf_q5_km"], ["Q6_K", "gguf_q8_0"], ["Q8_0", "gguf_q8_0"],
|
| 47 |
+
["Q4_0", "gguf_q4_km"], ["Q4_1", "gguf_q4_km"], ["Q5_0", "gguf_q5_km"],
|
| 48 |
+
["Q5_1", "gguf_q5_km"], ["F16", null], ["BF16", null], ["F32", null],
|
| 49 |
+
];
|
| 50 |
+
for (const [tag, scheme] of pairs) {
|
| 51 |
+
if (n.includes(tag)) return { label: tag.replace(/_$/, ""), scheme };
|
| 52 |
+
}
|
| 53 |
+
return { label: "?", scheme: null };
|
| 54 |
+
}
|
| 55 |
+
|
| 56 |
+
// List the .gguf files in a HF repo (so the user can pick a quant).
|
| 57 |
+
export async function listGgufFiles(repo) {
|
| 58 |
+
const resp = await fetch(`https://huggingface.co/api/models/${encodeURIComponent(repo).replace(/%2F/g, "/")}`);
|
| 59 |
+
if (!resp.ok) throw new Error(`HTTP ${resp.status} — repo not found or private`);
|
| 60 |
+
const data = await resp.json();
|
| 61 |
+
const sib = Array.isArray(data.siblings) ? data.siblings : [];
|
| 62 |
+
return sib.map(s => s.rfilename).filter(f => /\.gguf$/i.test(f)).sort();
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
// Incremental Range-fetch reader. GGUF metadata sits at the file head; arch +
|
| 66 |
+
// rope fields precede the big tokenizer arrays, so a few MB always suffices.
|
| 67 |
+
class GgufReader {
|
| 68 |
+
constructor(url) {
|
| 69 |
+
this.url = url;
|
| 70 |
+
this.buf = new Uint8Array(0);
|
| 71 |
+
this.dv = new DataView(this.buf.buffer);
|
| 72 |
+
this.off = 0;
|
| 73 |
+
this.fetched = 0;
|
| 74 |
+
this.CHUNK = 1 << 20; // 1 MB per range
|
| 75 |
+
this.MAX = 48 << 20; // hard cap 48 MB
|
| 76 |
+
this.eof = false;
|
| 77 |
+
}
|
| 78 |
+
async ensure(n) {
|
| 79 |
+
while (this.off + n > this.buf.length && !this.eof && this.fetched < this.MAX) {
|
| 80 |
+
const start = this.fetched;
|
| 81 |
+
const end = Math.min(this.fetched + this.CHUNK, this.MAX) - 1;
|
| 82 |
+
const resp = await fetch(this.url, { headers: { Range: `bytes=${start}-${end}` } });
|
| 83 |
+
if (!resp.ok && resp.status !== 206 && resp.status !== 200) throw new Error(`HTTP ${resp.status}`);
|
| 84 |
+
const part = new Uint8Array(await resp.arrayBuffer());
|
| 85 |
+
if (part.length === 0) { this.eof = true; break; }
|
| 86 |
+
const merged = new Uint8Array(this.buf.length + part.length);
|
| 87 |
+
merged.set(this.buf); merged.set(part, this.buf.length);
|
| 88 |
+
this.buf = merged;
|
| 89 |
+
this.dv = new DataView(this.buf.buffer);
|
| 90 |
+
this.fetched += part.length;
|
| 91 |
+
if (part.length < this.CHUNK) this.eof = true; // server returned the tail
|
| 92 |
+
}
|
| 93 |
+
if (this.off + n > this.buf.length) throw new Error("gguf_metadata_too_large");
|
| 94 |
+
}
|
| 95 |
+
async u8() { await this.ensure(1); return this.dv.getUint8(this.off++); }
|
| 96 |
+
async u16() { await this.ensure(2); const v = this.dv.getUint16(this.off, true); this.off += 2; return v; }
|
| 97 |
+
async i16() { await this.ensure(2); const v = this.dv.getInt16(this.off, true); this.off += 2; return v; }
|
| 98 |
+
async u32() { await this.ensure(4); const v = this.dv.getUint32(this.off, true); this.off += 4; return v; }
|
| 99 |
+
async i32() { await this.ensure(4); const v = this.dv.getInt32(this.off, true); this.off += 4; return v; }
|
| 100 |
+
async f32() { await this.ensure(4); const v = this.dv.getFloat32(this.off, true); this.off += 4; return v; }
|
| 101 |
+
async f64() { await this.ensure(8); const v = this.dv.getFloat64(this.off, true); this.off += 8; return v; }
|
| 102 |
+
// u64/i64 as Number — safe for counts/dims well under 2^53.
|
| 103 |
+
async u64() { await this.ensure(8); const lo = this.dv.getUint32(this.off, true); const hi = this.dv.getUint32(this.off + 4, true); this.off += 8; return hi * 4294967296 + lo; }
|
| 104 |
+
async i64() { return this.u64(); }
|
| 105 |
+
async skip(n) { await this.ensure(0); // ensure buffer exists
|
| 106 |
+
// skip may exceed current buffer; pull enough then advance offset
|
| 107 |
+
await this.ensure(Math.min(n, this.MAX)); this.off += n;
|
| 108 |
+
if (this.off > this.buf.length) { this.off = this.buf.length; throw new Error("gguf_metadata_too_large"); }
|
| 109 |
+
}
|
| 110 |
+
async str() {
|
| 111 |
+
const len = await this.u64();
|
| 112 |
+
await this.ensure(len);
|
| 113 |
+
const bytes = this.buf.subarray(this.off, this.off + len);
|
| 114 |
+
this.off += len;
|
| 115 |
+
return new TextDecoder("utf-8").decode(bytes);
|
| 116 |
+
}
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
async function readValue(r, type) {
|
| 120 |
+
switch (type) {
|
| 121 |
+
case GT.U8: return r.u8();
|
| 122 |
+
case GT.I8: { const v = await r.u8(); return v > 127 ? v - 256 : v; }
|
| 123 |
+
case GT.U16: return r.u16();
|
| 124 |
+
case GT.I16: return r.i16();
|
| 125 |
+
case GT.U32: return r.u32();
|
| 126 |
+
case GT.I32: return r.i32();
|
| 127 |
+
case GT.F32: return r.f32();
|
| 128 |
+
case GT.BOOL: return (await r.u8()) !== 0;
|
| 129 |
+
case GT.STR: return r.str();
|
| 130 |
+
case GT.U64: return r.u64();
|
| 131 |
+
case GT.I64: return r.i64();
|
| 132 |
+
case GT.F64: return r.f64();
|
| 133 |
+
case GT.ARR: {
|
| 134 |
+
const et = await r.u32();
|
| 135 |
+
const len = await r.u64();
|
| 136 |
+
if (FIXED_SIZE[et]) { await r.skip(len * FIXED_SIZE[et]); return { __array: len, elemType: et }; }
|
| 137 |
+
if (et === GT.STR) { for (let i = 0; i < len; i++) { const sl = await r.u64(); await r.skip(sl); } return { __array: len, elemType: et }; }
|
| 138 |
+
throw new Error("gguf_nested_array");
|
| 139 |
+
}
|
| 140 |
+
default: throw new Error(`gguf_unknown_type_${type}`);
|
| 141 |
+
}
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
+
// Parse the metadata KV block. Returns a flat { key: value } map (arrays are
|
| 145 |
+
// returned as {__array,len} stubs — we never need their contents here).
|
| 146 |
+
export async function fetchGgufMetadata(url) {
|
| 147 |
+
const r = new GgufReader(url);
|
| 148 |
+
const magic = (await r.u8()) | ((await r.u8()) << 8) | ((await r.u8()) << 16) | ((await r.u8()) << 24);
|
| 149 |
+
if (magic !== 0x46554747 /* 'GGUF' little-endian */) throw new Error("not_a_gguf_file");
|
| 150 |
+
const version = await r.u32();
|
| 151 |
+
const tensorCount = await r.u64();
|
| 152 |
+
const kvCount = await r.u64();
|
| 153 |
+
const kv = {};
|
| 154 |
+
for (let i = 0; i < kvCount; i++) {
|
| 155 |
+
const key = await r.str();
|
| 156 |
+
const type = await r.u32();
|
| 157 |
+
kv[key] = await readValue(r, type);
|
| 158 |
+
}
|
| 159 |
+
return { version, tensorCount, kvCount, kv, bytesRead: r.fetched };
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
// Map raw GGUF metadata → HF-style config (so quant_regime + TAF math can reuse it).
|
| 163 |
+
export function ggufToConfig(meta) {
|
| 164 |
+
const kv = meta.kv || {};
|
| 165 |
+
const arch = kv["general.architecture"];
|
| 166 |
+
const g = (suffix, fallback = null) => (arch && kv[`${arch}.${suffix}`] !== undefined ? kv[`${arch}.${suffix}`] : fallback);
|
| 167 |
+
|
| 168 |
+
const n_attn = g("attention.head_count");
|
| 169 |
+
const n_kv = g("attention.head_count_kv", n_attn);
|
| 170 |
+
const hidden = g("embedding_length");
|
| 171 |
+
const keyLen = g("attention.key_length");
|
| 172 |
+
const headDim = (typeof keyLen === "number") ? keyLen
|
| 173 |
+
: (n_attn && hidden ? hidden / n_attn : null);
|
| 174 |
+
const ftypeEnum = kv["general.file_type"];
|
| 175 |
+
const ftype = (typeof ftypeEnum === "number" && FTYPE[ftypeEnum]) ? FTYPE[ftypeEnum] : null;
|
| 176 |
+
|
| 177 |
+
return {
|
| 178 |
+
architecture: arch || "?",
|
| 179 |
+
quant_label: ftype ? ftype[0] : null,
|
| 180 |
+
quant_scheme: ftype ? ftype[1] : null,
|
| 181 |
+
rope_theta: g("rope.freq_base", null),
|
| 182 |
+
context_length: g("context_length", null),
|
| 183 |
+
rope_scaling_type: g("rope.scaling.type", null),
|
| 184 |
+
rope_scaling_factor: g("rope.scaling.factor", null),
|
| 185 |
+
rope_orig_ctx: g("rope.scaling.original_context_length", null),
|
| 186 |
+
// HF-config aliases for predictQuantShift / inferNParams:
|
| 187 |
+
num_attention_heads: n_attn ?? null,
|
| 188 |
+
num_key_value_heads: n_kv ?? null,
|
| 189 |
+
hidden_size: hidden ?? null,
|
| 190 |
+
head_dim: headDim,
|
| 191 |
+
num_hidden_layers: g("block_count", null),
|
| 192 |
+
sliding_window: g("attention.sliding_window", null),
|
| 193 |
+
vocab_size: g("vocab_size", null),
|
| 194 |
+
};
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
// Bridge verdict: combine GGUF geometry + TAF horizon + quant γ-shift.
|
| 198 |
+
// cfg : ggufToConfig output (may be edited by user / filename backstop)
|
| 199 |
+
// targetCtx : optional desired context L to check (else uses context_length)
|
| 200 |
+
export function analyzeGguf(cfg, targetCtx) {
|
| 201 |
+
const theta = Number(cfg.rope_theta) || 10000;
|
| 202 |
+
const nCtx = Number(cfg.context_length) || null;
|
| 203 |
+
const L = Number(targetCtx) || nCtx;
|
| 204 |
+
|
| 205 |
+
// fp16 attention horizon — architectural, set by θ. SAME across every quant
|
| 206 |
+
// of the model (quantisation adds noise, it does not change θ). d_horizon is
|
| 207 |
+
// a function of the *natural* Padé γ, so it must be computed from the fp16 γ —
|
| 208 |
+
// never from a quant-shifted γ (that inverts the formula and is meaningless).
|
| 209 |
+
const gammaTrain = nCtx ? gammaPade(theta, nCtx) : null;
|
| 210 |
+
const dHoriz = gammaTrain != null ? dHorizon(theta, gammaTrain) : null;
|
| 211 |
+
|
| 212 |
+
// Quant γ-shift via the existing quant-regime model (architecture-aware).
|
| 213 |
+
const quant = cfg.quant_scheme ? predictQuantShift(cfg, cfg.quant_scheme) : null;
|
| 214 |
+
|
| 215 |
+
// γ at the target L: fp16, then after the quant shift. This is the quantity
|
| 216 |
+
// that degrades monotonically with worse quant — the correct comparison axis.
|
| 217 |
+
const gammaAtL = (theta && L) ? gammaPade(theta, L) : null;
|
| 218 |
+
const shift = quant ? quant.gamma_shift : 0;
|
| 219 |
+
const gammaQuant = (gammaAtL != null) ? gammaAtL - shift : null;
|
| 220 |
+
|
| 221 |
+
// Verdict is driven by γ@L after quant (the direct attention-quality signal
|
| 222 |
+
// at the target length) plus the quant-regime band. We deliberately do NOT
|
| 223 |
+
// gate on L ≤ d_horizon: the closed-form d_horizon understates the true reach
|
| 224 |
+
// for high-θ models (e.g. Qwen θ=1e6 keeps γ healthy far past its d_horizon),
|
| 225 |
+
// so γ@L is the honest measure. `reaches` is reported for context only.
|
| 226 |
+
const reaches = dHoriz != null && L != null && L <= dHoriz;
|
| 227 |
+
const collapsed = !Number.isFinite(gammaQuant) || gammaQuant <= 0.2;
|
| 228 |
+
const quantCliff = quant && quant.regime === "cliff";
|
| 229 |
+
let verdict;
|
| 230 |
+
if (nCtx == null || theta == null) verdict = "incomplete";
|
| 231 |
+
else if (collapsed || quantCliff) verdict = "degrades";
|
| 232 |
+
else if (gammaQuant >= 0.6 && (!quant || quant.regime === "safe" || quant.regime === "mild")) verdict = "healthy";
|
| 233 |
+
else verdict = "usable_with_care";
|
| 234 |
+
|
| 235 |
+
return {
|
| 236 |
+
theta, nCtx, L,
|
| 237 |
+
gammaTrain, dHoriz, // fp16 architectural horizon (shared across quants)
|
| 238 |
+
gammaAtL, gammaQuant, // attention at L: fp16 vs after-quant
|
| 239 |
+
reaches, // is L within the fp16 horizon?
|
| 240 |
+
quant, // {gamma_shift, regime, delta_ppl, ...} or null
|
| 241 |
+
quantLabel: cfg.quant_label,
|
| 242 |
+
arch: cfg.architecture,
|
| 243 |
+
verdict,
|
| 244 |
+
};
|
| 245 |
+
}
|
|
@@ -427,6 +427,44 @@ export const TRANSLATIONS = {
|
|
| 427 |
"mode_desc.hub": "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
|
| 428 |
"modes.yarn": "🧵 YaRN Planner",
|
| 429 |
"mode_desc.yarn": "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 430 |
"yarn.title": "🧵 YaRN / RoPE Context-Extension Planner",
|
| 431 |
"yarn.tip": "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
|
| 432 |
"yarn.desc": "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
|
|
@@ -1738,6 +1776,44 @@ export const TRANSLATIONS = {
|
|
| 1738 |
"mode_desc.hub": "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
|
| 1739 |
"modes.yarn": "🧵 Planificador YaRN",
|
| 1740 |
"mode_desc.yarn": "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1741 |
"yarn.title": "🧵 Planificador de extensión de contexto YaRN / RoPE",
|
| 1742 |
"yarn.tip": "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
|
| 1743 |
"yarn.desc": "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
|
|
@@ -2903,6 +2979,44 @@ export const TRANSLATIONS = {
|
|
| 2903 |
"mode_desc.hub": "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
|
| 2904 |
"modes.yarn": "🧵 Planificateur YaRN",
|
| 2905 |
"mode_desc.yarn": "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2906 |
"yarn.title": "🧵 Planificateur d'extension de contexte YaRN / RoPE",
|
| 2907 |
"yarn.tip": "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
|
| 2908 |
"yarn.desc": "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
|
|
@@ -4068,6 +4182,44 @@ export const TRANSLATIONS = {
|
|
| 4068 |
"mode_desc.hub": "每个 LLM-eval 问题的地图 → tafagent 模式(若覆盖)+ 精选外部工具。找到方案而非重新发明。30+ 问题,7 类别。",
|
| 4069 |
"modes.yarn": "🧵 YaRN 规划器",
|
| 4070 |
"mode_desc.yarn": "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决:在目标长度下注意力质量是否真的撑得住。",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4071 |
"yarn.title": "🧵 YaRN / RoPE 上下文扩展规划器",
|
| 4072 |
"yarn.tip": "<strong>配置 + 裁决,不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块,并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制,全在浏览器内运行。",
|
| 4073 |
"yarn.desc": "想让模型超出其训练上下文运行?输入模型(或其 θ + 训练上下文)和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段(transformers ≥4.43),外加 TAF 裁决:有效注意力视界能否到达 L,还是模型在 d_horizon 之外就开始幻觉?",
|
|
|
|
| 427 |
"mode_desc.hub": "Map of every documented LLM-eval pain → tafagent mode (if covered) + curated external tools. Find the right solution without rebuilding it. 30+ pains, 7 categories.",
|
| 428 |
"modes.yarn": "🧵 YaRN Planner",
|
| 429 |
"mode_desc.yarn": "Generate the exact rope_scaling config to extend a model past its trained context — plus a TAF verdict on whether attention quality actually holds at the target length.",
|
| 430 |
+
"modes.gguf": "🧊 GGUF Bridge",
|
| 431 |
+
"mode_desc.gguf": "Read a GGUF file's metadata header (rope_theta, context_length, quant) in your browser and get a TAF quality verdict — the question the VRAM calculators skip: fits AND works?",
|
| 432 |
+
"gguf.title": "🧊 GGUF Validity Bridge",
|
| 433 |
+
"gguf.tip": "<strong>Fits in VRAM ≠ works</strong>. The GGUF/VRAM calculators read a model's metadata to tell you if a quant <em>fits in your GPU</em>. This reads the SAME metadata (rope_theta, context_length, quant scheme, head geometry) straight from the <code>.gguf</code> header via HTTP Range — no multi-GB download — and answers the question they don't: does attention quality actually hold, and how much does the quant erode it (γ-shift, ΔPPL)?",
|
| 434 |
+
"gguf.desc": "Paste a GGUF repo (e.g. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), pick a quant file, and get a TAF quality verdict: the model's effective attention horizon, plus how much the chosen quantization shifts γ for <em>this specific architecture</em>. Reads only the file header in your browser.",
|
| 435 |
+
"gguf.repo_label": "GGUF repo id:",
|
| 436 |
+
"gguf.list_btn": "📂 List quant files",
|
| 437 |
+
"gguf.file_label": "Quant file:",
|
| 438 |
+
"gguf.target_label": "Target context L (optional):",
|
| 439 |
+
"gguf.analyze_btn": "🧊 Analyze GGUF",
|
| 440 |
+
"gguf.all_btn": "📊 Compare all quants",
|
| 441 |
+
"gguf.compare_title": "All quants — quality comparison",
|
| 442 |
+
"gguf.col.verdict": "Verdict",
|
| 443 |
+
"gguf.col.gamma_at_l": "γ @ L (after quant)",
|
| 444 |
+
"gguf.need_repo": "Enter a GGUF repo id like 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
|
| 445 |
+
"gguf.listing": "Listing .gguf files from HF Hub…",
|
| 446 |
+
"gguf.no_files": "No .gguf files found in that repo.",
|
| 447 |
+
"gguf.found": "quant files found",
|
| 448 |
+
"gguf.pick_hint": "pick one and click Analyze.",
|
| 449 |
+
"gguf.reading": "Reading GGUF header via HTTP Range…",
|
| 450 |
+
"gguf.read_ok": "Header parsed",
|
| 451 |
+
"gguf.verdict.healthy": "HEALTHY — effective horizon reaches L with good γ after quant",
|
| 452 |
+
"gguf.verdict.usable_with_care":"USABLE WITH CARE — reaches L but γ is modest after quant",
|
| 453 |
+
"gguf.verdict.degrades": "DEGRADES — attention collapses before L (or quant pushes it there)",
|
| 454 |
+
"gguf.r.arch": "Architecture",
|
| 455 |
+
"gguf.r.ctx_train": "Trained context",
|
| 456 |
+
"gguf.r.horizon_fp16": "Attention horizon (fp16)",
|
| 457 |
+
"gguf.r.quant": "Quant scheme",
|
| 458 |
+
"gguf.r.gamma_shift": "γ-shift from quant",
|
| 459 |
+
"gguf.r.after_quant": "(after quant)",
|
| 460 |
+
"gguf.r.eff_horizon": "Effective horizon (quantised)",
|
| 461 |
+
"gguf.r.no_quant_shift": "— full precision, no γ-shift",
|
| 462 |
+
"gguf.r.note": "Horizon from γ_Padé / d_horizon (architecture). Quant γ-shift + ΔPPL from the quant-regime model (calibrated to llama.cpp PPL + AWQ/GPTQ papers). Both are estimates — verify borderline cases with a real eval.",
|
| 463 |
+
"gguf.err.not_gguf": "That file isn't a valid GGUF (bad magic).",
|
| 464 |
+
"gguf.err.too_large": "Metadata header exceeds the fetch cap — unusually large tokenizer. Try another quant.",
|
| 465 |
+
"gguf.err.incomplete": "GGUF metadata is missing rope_theta or context_length — can't compute the horizon.",
|
| 466 |
+
"help.v091.gguf.title": "🧊 GGUF Validity Bridge",
|
| 467 |
+
"help.v091.gguf.body": "The dozen GGUF/VRAM calculators (NyxKrage, oobabooga, …) read a <code>.gguf</code> header to tell you if a quant <em>fits in your GPU</em>. This reads the same header — via HTTP Range, so no multi-GB download — and answers the question they skip: <em>does it fit AND still work?</em> Paste a GGUF repo, pick a quant file; the bridge pulls <code>rope_theta</code>, <code>context_length</code>, the quant scheme (from <code>general.file_type</code> or the filename), and head geometry, then runs TAF's γ_Padé / d_horizon plus the architecture-aware quant-regime γ-shift. Output: effective attention horizon at the trained context, how far the quant erodes γ (and ΔPPL) for <em>this</em> model, and a verdict. <em>Use case</em>: 'Q4_K_M fits 8GB — but is it brain-dead past 30K?' → see the horizon and the Q4 γ-penalty before you download 6 GB.",
|
| 468 |
"yarn.title": "🧵 YaRN / RoPE Context-Extension Planner",
|
| 469 |
"yarn.tip": "<strong>Config + verdict, not just VRAM</strong>. The GGUF/VRAM calculators tell you if a context length <em>fits in GPU</em>. This tells you the exact <code>rope_scaling</code> block to put in <code>config.json</code> AND whether attention quality will actually hold at that length — using TAF's γ_Padé / d_horizon machinery, all in your browser.",
|
| 470 |
"yarn.desc": "Want to run a model past its trained context? Enter the model (or its θ + trained context) and your target length L. Get the copy-paste <code>rope_scaling</code> snippet for transformers ≥4.43, plus a TAF verdict: does the effective attention horizon reach L, or will the model just hallucinate past d_horizon?",
|
|
|
|
| 1776 |
"mode_desc.hub": "Mapa de cada problema documentado de LLM-eval → mode tafagent (si cubierto) + herramientas externas curadas. Encuentra la solución sin reinventarla. 30+ pains, 7 categorías.",
|
| 1777 |
"modes.yarn": "🧵 Planificador YaRN",
|
| 1778 |
"mode_desc.yarn": "Genera la configuración rope_scaling exacta para extender un modelo más allá de su contexto entrenado — más un veredicto TAF sobre si la calidad de atención aguanta realmente a la longitud objetivo.",
|
| 1779 |
+
"modes.gguf": "🧊 Puente GGUF",
|
| 1780 |
+
"mode_desc.gguf": "Lee la cabecera de metadata de un archivo GGUF (rope_theta, context_length, quant) en tu navegador y obtén un veredicto de calidad TAF — la pregunta que los calculadores de VRAM ignoran: ¿cabe Y funciona?",
|
| 1781 |
+
"gguf.title": "🧊 Puente de validez GGUF",
|
| 1782 |
+
"gguf.tip": "<strong>Caber en VRAM ≠ funcionar</strong>. Los calculadores GGUF/VRAM leen la metadata de un modelo para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la MISMA metadata (rope_theta, context_length, esquema de quant, geometría de cabezas) directamente de la cabecera <code>.gguf</code> vía HTTP Range — sin descargar GB — y responde lo que ellos no: ¿aguanta de verdad la calidad de atención, y cuánto la erosiona el quant (γ-shift, ΔPPL)?",
|
| 1783 |
+
"gguf.desc": "Pega un repo GGUF (p.ej. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), elige un archivo de quant, y obtén un veredicto de calidad TAF: el horizonte de atención efectivo del modelo, más cuánto desplaza γ la cuantización elegida para <em>esta arquitectura concreta</em>. Solo lee la cabecera del archivo en tu navegador.",
|
| 1784 |
+
"gguf.repo_label": "ID del repo GGUF:",
|
| 1785 |
+
"gguf.list_btn": "📂 Listar archivos quant",
|
| 1786 |
+
"gguf.file_label": "Archivo quant:",
|
| 1787 |
+
"gguf.target_label": "Contexto objetivo L (opcional):",
|
| 1788 |
+
"gguf.analyze_btn": "🧊 Analizar GGUF",
|
| 1789 |
+
"gguf.all_btn": "📊 Comparar todos los quants",
|
| 1790 |
+
"gguf.compare_title": "Todos los quants — comparación de calidad",
|
| 1791 |
+
"gguf.col.verdict": "Veredicto",
|
| 1792 |
+
"gguf.col.gamma_at_l": "γ @ L (tras quant)",
|
| 1793 |
+
"gguf.need_repo": "Introduce un id de repo GGUF como 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
|
| 1794 |
+
"gguf.listing": "Listando archivos .gguf de HF Hub…",
|
| 1795 |
+
"gguf.no_files": "No se encontraron archivos .gguf en ese repo.",
|
| 1796 |
+
"gguf.found": "archivos quant encontrados",
|
| 1797 |
+
"gguf.pick_hint": "elige uno y pulsa Analizar.",
|
| 1798 |
+
"gguf.reading": "Leyendo cabecera GGUF vía HTTP Range…",
|
| 1799 |
+
"gguf.read_ok": "Cabecera analizada",
|
| 1800 |
+
"gguf.verdict.healthy": "SANO — el horizonte efectivo alcanza L con buen γ tras quant",
|
| 1801 |
+
"gguf.verdict.usable_with_care":"USABLE CON CUIDADO — alcanza L pero γ es modesto tras quant",
|
| 1802 |
+
"gguf.verdict.degrades": "DEGRADA — la atención colapsa antes de L (o el quant la empuja ahí)",
|
| 1803 |
+
"gguf.r.arch": "Arquitectura",
|
| 1804 |
+
"gguf.r.ctx_train": "Contexto entrenado",
|
| 1805 |
+
"gguf.r.horizon_fp16": "Horizonte de atención (fp16)",
|
| 1806 |
+
"gguf.r.quant": "Esquema de quant",
|
| 1807 |
+
"gguf.r.gamma_shift": "γ-shift por quant",
|
| 1808 |
+
"gguf.r.after_quant": "(tras quant)",
|
| 1809 |
+
"gguf.r.eff_horizon": "Horizonte efectivo (cuantizado)",
|
| 1810 |
+
"gguf.r.no_quant_shift": "— precisión completa, sin γ-shift",
|
| 1811 |
+
"gguf.r.note": "Horizonte desde γ_Padé / d_horizon (arquitectura). γ-shift de quant + ΔPPL desde el modelo quant-regime (calibrado a PPL de llama.cpp + papers AWQ/GPTQ). Ambos son estimaciones — verifica los casos límite con un eval real.",
|
| 1812 |
+
"gguf.err.not_gguf": "Ese archivo no es un GGUF válido (magic incorrecto).",
|
| 1813 |
+
"gguf.err.too_large": "La cabecera de metadata supera el límite de descarga — tokenizer inusualmente grande. Prueba otro quant.",
|
| 1814 |
+
"gguf.err.incomplete": "A la metadata GGUF le falta rope_theta o context_length — no se puede calcular el horizonte.",
|
| 1815 |
+
"help.v091.gguf.title": "🧊 Puente de validez GGUF",
|
| 1816 |
+
"help.v091.gguf.body": "La docena de calculadores GGUF/VRAM (NyxKrage, oobabooga, …) leen una cabecera <code>.gguf</code> para decirte si un quant <em>cabe en tu GPU</em>. Esto lee la misma cabecera — vía HTTP Range, sin descargar GB — y responde lo que ellos saltan: <em>¿cabe Y además funciona?</em> Pega un repo GGUF, elige un archivo de quant; el puente extrae <code>rope_theta</code>, <code>context_length</code>, el esquema de quant (de <code>general.file_type</code> o del nombre del archivo), y la geometría de cabezas, luego corre γ_Padé / d_horizon de TAF más el γ-shift de quant consciente de arquitectura. Salida: horizonte de atención efectivo en el contexto entrenado, cuánto erosiona γ el quant (y ΔPPL) para <em>este</em> modelo, y un veredicto. <em>Caso de uso</em>: 'Q4_K_M cabe en 8GB — ¿pero se vuelve tonto pasado 30K?' → ve el horizonte y la penalización γ de Q4 antes de descargar 6 GB.",
|
| 1817 |
"yarn.title": "🧵 Planificador de extensión de contexto YaRN / RoPE",
|
| 1818 |
"yarn.tip": "<strong>Config + veredicto, no solo VRAM</strong>. Las calculadoras GGUF/VRAM te dicen si una longitud de contexto <em>cabe en la GPU</em>. Esto te da el bloque <code>rope_scaling</code> exacto para <code>config.json</code> Y si la calidad de atención aguantará realmente a esa longitud — con la maquinaria γ_Padé / d_horizon de TAF, todo en tu navegador.",
|
| 1819 |
"yarn.desc": "¿Quieres usar un modelo más allá de su contexto entrenado? Introduce el modelo (o su θ + contexto entrenado) y tu longitud objetivo L. Obtén el fragmento <code>rope_scaling</code> listo para pegar (transformers ≥4.43), más un veredicto TAF: ¿llega el horizonte de atención efectivo a L, o el modelo alucinará pasado d_horizon?",
|
|
|
|
| 2979 |
"mode_desc.hub": "Carte de chaque problème documenté de LLM-eval → mode tafagent (si couvert) + outils externes curés. Trouvez la solution sans la réinventer. 30+ pains, 7 catégories.",
|
| 2980 |
"modes.yarn": "🧵 Planificateur YaRN",
|
| 2981 |
"mode_desc.yarn": "Génère la configuration rope_scaling exacte pour étendre un modèle au-delà de son contexte d'entraînement — plus un verdict TAF sur la tenue réelle de la qualité d'attention à la longueur cible.",
|
| 2982 |
+
"modes.gguf": "🧊 Pont GGUF",
|
| 2983 |
+
"mode_desc.gguf": "Lit l'en-tête de métadonnées d'un fichier GGUF (rope_theta, context_length, quant) dans votre navigateur et donne un verdict de qualité TAF — la question que les calculateurs de VRAM ignorent : tient ET fonctionne ?",
|
| 2984 |
+
"gguf.title": "🧊 Pont de validité GGUF",
|
| 2985 |
+
"gguf.tip": "<strong>Tenir dans la VRAM ≠ fonctionner</strong>. Les calculateurs GGUF/VRAM lisent les métadonnées d'un modèle pour dire si un quant <em>tient dans le GPU</em>. Ceci lit les MÊMES métadonnées (rope_theta, context_length, schéma de quant, géométrie des têtes) directement depuis l'en-tête <code>.gguf</code> via HTTP Range — sans télécharger des Go — et répond à ce qu'ils n'abordent pas : la qualité d'attention tient-elle vraiment, et de combien le quant l'érode-t-il (γ-shift, ΔPPL) ?",
|
| 2986 |
+
"gguf.desc": "Collez un dépôt GGUF (ex. <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>), choisissez un fichier de quant, et obtenez un verdict de qualité TAF : l'horizon d'attention effectif du modèle, plus de combien la quantification choisie décale γ pour <em>cette architecture précise</em>. Ne lit que l'en-tête du fichier dans votre navigateur.",
|
| 2987 |
+
"gguf.repo_label": "ID du dépôt GGUF :",
|
| 2988 |
+
"gguf.list_btn": "📂 Lister les fichiers quant",
|
| 2989 |
+
"gguf.file_label": "Fichier quant :",
|
| 2990 |
+
"gguf.target_label": "Contexte cible L (optionnel) :",
|
| 2991 |
+
"gguf.analyze_btn": "🧊 Analyser le GGUF",
|
| 2992 |
+
"gguf.all_btn": "📊 Comparer tous les quants",
|
| 2993 |
+
"gguf.compare_title": "Tous les quants — comparaison de qualité",
|
| 2994 |
+
"gguf.col.verdict": "Verdict",
|
| 2995 |
+
"gguf.col.gamma_at_l": "γ @ L (après quant)",
|
| 2996 |
+
"gguf.need_repo": "Saisissez un id de dépôt GGUF comme 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
|
| 2997 |
+
"gguf.listing": "Listage des fichiers .gguf depuis HF Hub…",
|
| 2998 |
+
"gguf.no_files": "Aucun fichier .gguf trouvé dans ce dépôt.",
|
| 2999 |
+
"gguf.found": "fichiers quant trouvés",
|
| 3000 |
+
"gguf.pick_hint": "choisissez-en un et cliquez Analyser.",
|
| 3001 |
+
"gguf.reading": "Lecture de l'en-tête GGUF via HTTP Range…",
|
| 3002 |
+
"gguf.read_ok": "En-tête analysé",
|
| 3003 |
+
"gguf.verdict.healthy": "SAIN — l'horizon effectif atteint L avec un bon γ après quant",
|
| 3004 |
+
"gguf.verdict.usable_with_care":"UTILISABLE AVEC PRUDENCE — atteint L mais γ est modeste après quant",
|
| 3005 |
+
"gguf.verdict.degrades": "DÉGRADE — l'attention s'effondre avant L (ou le quant l'y pousse)",
|
| 3006 |
+
"gguf.r.arch": "Architecture",
|
| 3007 |
+
"gguf.r.ctx_train": "Contexte d'entraînement",
|
| 3008 |
+
"gguf.r.horizon_fp16": "Horizon d'attention (fp16)",
|
| 3009 |
+
"gguf.r.quant": "Schéma de quant",
|
| 3010 |
+
"gguf.r.gamma_shift": "γ-shift dû au quant",
|
| 3011 |
+
"gguf.r.after_quant": "(après quant)",
|
| 3012 |
+
"gguf.r.eff_horizon": "Horizon effectif (quantifié)",
|
| 3013 |
+
"gguf.r.no_quant_shift": "— pleine précision, pas de γ-shift",
|
| 3014 |
+
"gguf.r.note": "Horizon depuis γ_Padé / d_horizon (architecture). γ-shift de quant + ΔPPL depuis le modèle quant-regime (calibré sur la PPL de llama.cpp + papiers AWQ/GPTQ). Les deux sont des estimations — vérifiez les cas limites avec un éval réel.",
|
| 3015 |
+
"gguf.err.not_gguf": "Ce fichier n'est pas un GGUF valide (mauvais magic).",
|
| 3016 |
+
"gguf.err.too_large": "L'en-tête de métadonnées dépasse la limite de téléchargement — tokenizer inhabituellement grand. Essayez un autre quant.",
|
| 3017 |
+
"gguf.err.incomplete": "Il manque rope_theta ou context_length dans les métadonnées GGUF — impossible de calculer l'horizon.",
|
| 3018 |
+
"help.v091.gguf.title": "🧊 Pont de validité GGUF",
|
| 3019 |
+
"help.v091.gguf.body": "La douzaine de calculateurs GGUF/VRAM (NyxKrage, oobabooga, …) lisent un en-tête <code>.gguf</code> pour dire si un quant <em>tient dans le GPU</em>. Ceci lit le même en-tête — via HTTP Range, sans télécharger des Go — et répond à ce qu'ils sautent : <em>tient-il ET fonctionne-t-il encore ?</em> Collez un dépôt GGUF, choisissez un fichier de quant ; le pont extrait <code>rope_theta</code>, <code>context_length</code>, le schéma de quant (depuis <code>general.file_type</code> ou le nom de fichier) et la géométrie des têtes, puis exécute γ_Padé / d_horizon de TAF plus le γ-shift de quant conscient de l'architecture. Sortie : horizon d'attention effectif au contexte d'entraînement, de combien le quant érode γ (et ΔPPL) pour <em>ce</em> modèle, et un verdict. <em>Cas d'usage</em> : 'Q4_K_M tient dans 8 Go — mais est-il abruti au-delà de 30K ?' → voyez l'horizon et la pénalité γ de Q4 avant de télécharger 6 Go.",
|
| 3020 |
"yarn.title": "🧵 Planificateur d'extension de contexte YaRN / RoPE",
|
| 3021 |
"yarn.tip": "<strong>Config + verdict, pas seulement la VRAM</strong>. Les calculateurs GGUF/VRAM disent si une longueur de contexte <em>tient dans le GPU</em>. Ceci donne le bloc <code>rope_scaling</code> exact pour <code>config.json</code> ET si la qualité d'attention tiendra réellement à cette longueur — avec la machinerie γ_Padé / d_horizon de TAF, entièrement dans votre navigateur.",
|
| 3022 |
"yarn.desc": "Vous voulez utiliser un modèle au-delà de son contexte d'entraînement ? Saisissez le modèle (ou son θ + contexte d'entraînement) et votre longueur cible L. Obtenez le fragment <code>rope_scaling</code> prêt à coller (transformers ≥4.43), plus un verdict TAF : l'horizon d'attention effectif atteint-il L, ou le modèle va-t-il halluciner au-delà de d_horizon ?",
|
|
|
|
| 4182 |
"mode_desc.hub": "每个 LLM-eval 问题的地图 → tafagent 模式(若覆盖)+ 精选外部工具。找到方案而非重新发明。30+ 问题,7 类别。",
|
| 4183 |
"modes.yarn": "🧵 YaRN 规划器",
|
| 4184 |
"mode_desc.yarn": "生成精确的 rope_scaling 配置以将模型扩展到训练上下文之外 —— 外加 TAF 裁决:在目标长度下注意力质量是否真的撑得住。",
|
| 4185 |
+
"modes.gguf": "🧊 GGUF 桥",
|
| 4186 |
+
"mode_desc.gguf": "在浏览器内读取 GGUF 文件的元数据头(rope_theta、context_length、量化),给出 TAF 质量裁决 —— 显存计算器跳过的那个问题:塞得进且还能用吗?",
|
| 4187 |
+
"gguf.title": "🧊 GGUF 有效性桥",
|
| 4188 |
+
"gguf.tip": "<strong>塞进显存 ≠ 能用</strong>。GGUF/显存计算器读取模型元数据来告诉你某量化<em>是否塞得进 GPU</em>。本工具通过 HTTP Range 直接从 <code>.gguf</code> 头读取同样的元数据(rope_theta、context_length、量化方案、注意力头几何)—— 无需下载数 GB —— 并回答它们不答的:注意力质量是否真的撑得住,量化又侵蚀了多少(γ-shift、ΔPPL)?",
|
| 4189 |
+
"gguf.desc": "粘贴一个 GGUF 仓库(如 <code>Qwen/Qwen2.5-7B-Instruct-GGUF</code>),选择一个量化文件,获得 TAF 质量裁决:模型的有效注意力视界,以及所选量化对<em>这个具体架构</em>的 γ 位移有多大。只在浏览器内读取文件头。",
|
| 4190 |
+
"gguf.repo_label": "GGUF 仓库 id:",
|
| 4191 |
+
"gguf.list_btn": "📂 列出量化文件",
|
| 4192 |
+
"gguf.file_label": "量化文件:",
|
| 4193 |
+
"gguf.target_label": "目标上下文 L(可选):",
|
| 4194 |
+
"gguf.analyze_btn": "🧊 分析 GGUF",
|
| 4195 |
+
"gguf.all_btn": "📊 比较所有量化",
|
| 4196 |
+
"gguf.compare_title": "所有量化 —— 质量对比",
|
| 4197 |
+
"gguf.col.verdict": "裁决",
|
| 4198 |
+
"gguf.col.gamma_at_l": "γ @ L(量化后)",
|
| 4199 |
+
"gguf.need_repo": "输入 GGUF 仓库 id,如 'Qwen/Qwen2.5-7B-Instruct-GGUF'",
|
| 4200 |
+
"gguf.listing": "正在从 HF Hub 列出 .gguf 文件…",
|
| 4201 |
+
"gguf.no_files": "该仓库中未找到 .gguf 文件。",
|
| 4202 |
+
"gguf.found": "个量化文件已找到",
|
| 4203 |
+
"gguf.pick_hint": "选一个并点击分析。",
|
| 4204 |
+
"gguf.reading": "正在通过 HTTP Range 读取 GGUF 头…",
|
| 4205 |
+
"gguf.read_ok": "头已解析",
|
| 4206 |
+
"gguf.verdict.healthy": "健康 —— 量化后有效视界以良好的 γ 到达 L",
|
| 4207 |
+
"gguf.verdict.usable_with_care":"可用但需谨慎 —— 到达 L,但量化后 γ 偏低",
|
| 4208 |
+
"gguf.verdict.degrades": "退化 —— 注意力在 L 之前崩溃(或被量化推到那里)",
|
| 4209 |
+
"gguf.r.arch": "架构",
|
| 4210 |
+
"gguf.r.ctx_train": "训练上下文",
|
| 4211 |
+
"gguf.r.horizon_fp16": "注意力视界(fp16)",
|
| 4212 |
+
"gguf.r.quant": "量化方案",
|
| 4213 |
+
"gguf.r.gamma_shift": "量化导致的 γ 位移",
|
| 4214 |
+
"gguf.r.after_quant": "(量化后)",
|
| 4215 |
+
"gguf.r.eff_horizon": "有效视界(量化后)",
|
| 4216 |
+
"gguf.r.no_quant_shift": "—— 全精度,无 γ 位移",
|
| 4217 |
+
"gguf.r.note": "视界来自 γ_Padé / d_horizon(架构)。量化 γ 位移 + ΔPPL 来自 quant-regime 模型(以 llama.cpp PPL + AWQ/GPTQ 论文校准)。两者皆为估计 —— 边界情况请用真实评测核实。",
|
| 4218 |
+
"gguf.err.not_gguf": "该文件不是有效的 GGUF(magic 错误)。",
|
| 4219 |
+
"gguf.err.too_large": "元数据头超出获取上限 —— tokenizer 异常大。请换一个量化。",
|
| 4220 |
+
"gguf.err.incomplete": "GGUF 元数据缺少 rope_theta 或 context_length —— 无法计算视界。",
|
| 4221 |
+
"help.v091.gguf.title": "🧊 GGUF 有效性桥",
|
| 4222 |
+
"help.v091.gguf.body": "那一打 GGUF/显存计算器(NyxKrage、oobabooga……)读取 <code>.gguf</code> 头来告诉你某量化<em>是否塞得进 GPU</em>。本工具读取同样的头 —— 通过 HTTP Range,无需下载数 GB —— 并回答它们跳过的:<em>塞得进且还能用吗?</em> 粘贴一个 GGUF 仓库,选择一个量化文件;桥会提取 <code>rope_theta</code>、<code>context_length</code>、量化方案(来自 <code>general.file_type</code> 或文件名)和头几何,然后运行 TAF 的 γ_Padé / d_horizon 加上架构感知的 quant-regime γ 位移。输出:训练上下文处的有效注意力视界、量化对<em>该</em>模型侵蚀 γ(及 ΔPPL)的程度,以及裁决。<em>用例</em>:'Q4_K_M 塞得进 8GB —— 但超过 30K 会变傻吗?' → 在下载 6 GB 之前先看视界和 Q4 的 γ 惩罚。",
|
| 4223 |
"yarn.title": "🧵 YaRN / RoPE 上下文扩展规划器",
|
| 4224 |
"yarn.tip": "<strong>配置 + 裁决,不只是显存</strong>。GGUF/显存计算器告诉你某上下文长度<em>是否塞得进 GPU</em>。本工具给出要放入 <code>config.json</code> 的精确 <code>rope_scaling</code> 块,并判断该长度下注意力质量是否真的撑得住 —— 使用 TAF 的 γ_Padé / d_horizon 机制,全在浏览器内运行。",
|
| 4225 |
"yarn.desc": "想让模型超出其训练上下文运行?输入模型(或其 θ + 训练上下文)和你的目标长度 L。获得可复制粘贴的 <code>rope_scaling</code> 片段(transformers ≥4.43),外加 TAF 裁决:有效注意力视界能否到达 L,还是模型在 d_horizon 之外就开始幻觉?",
|
|
@@ -39,6 +39,7 @@ import {
|
|
| 39 |
loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
|
| 40 |
} from "./longscore.js";
|
| 41 |
import { planExtension, suggestRopeType } from "./yarn_planner.js";
|
|
|
|
| 42 |
|
| 43 |
// Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
|
| 44 |
// Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
|
|
@@ -233,6 +234,7 @@ document.addEventListener("click", (e) => {
|
|
| 233 |
longscore: "longscore-section",
|
| 234 |
hub: "hub-section",
|
| 235 |
yarn: "yarn-section",
|
|
|
|
| 236 |
}[targetMode];
|
| 237 |
if (sectionId) {
|
| 238 |
const sec = document.getElementById(sectionId);
|
|
@@ -257,7 +259,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 257 |
"diagnose-section", "phase-section", "unmask-section",
|
| 258 |
"template-section", "arena-section", "contam-section",
|
| 259 |
"quant-section", "drift-section", "niah-section",
|
| 260 |
-
"saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section"].forEach(id => {
|
| 261 |
const el = $(id);
|
| 262 |
if (el) el.style.display = "none";
|
| 263 |
});
|
|
@@ -277,6 +279,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 277 |
longscore: "longscore-section",
|
| 278 |
hub: "hub-section",
|
| 279 |
yarn: "yarn-section",
|
|
|
|
| 280 |
};
|
| 281 |
const sectionId = sectionMap[mode];
|
| 282 |
if (sectionId) $(sectionId).style.display = "";
|
|
@@ -291,6 +294,7 @@ document.querySelectorAll(".mode-btn").forEach(btn => {
|
|
| 291 |
if (mode === "longscore") initLongscore();
|
| 292 |
if (mode === "hub") initHub();
|
| 293 |
if (mode === "yarn") initYarn();
|
|
|
|
| 294 |
});
|
| 295 |
});
|
| 296 |
|
|
@@ -4661,9 +4665,20 @@ function initYarn() {
|
|
| 4661 |
});
|
| 4662 |
}
|
| 4663 |
|
|
|
|
|
|
|
| 4664 |
function _yarnFmtK(n) {
|
| 4665 |
if (n == null || !Number.isFinite(n)) return "—";
|
| 4666 |
-
if (n >=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4667 |
return String(Math.round(n));
|
| 4668 |
}
|
| 4669 |
function _yarnFmtG(g) {
|
|
@@ -4720,7 +4735,7 @@ function renderYarnPlan(p) {
|
|
| 4720 |
<tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
|
| 4721 |
<tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
|
| 4722 |
<tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
|
| 4723 |
-
<tr><td style="${td}">θ_eff</td><td>${
|
| 4724 |
<tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ < L"}</td></tr>
|
| 4725 |
</table>
|
| 4726 |
<h3>${t("yarn.r.snippet")}</h3>
|
|
@@ -4736,6 +4751,196 @@ function renderYarnPlan(p) {
|
|
| 4736 |
});
|
| 4737 |
}
|
| 4738 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4739 |
// ════════════════════════════════════════════════════════════════════
|
| 4740 |
// Bootstrap
|
| 4741 |
// ════════════════════════════════════════════════════════════════════
|
|
|
|
| 39 |
loadKB as loadLongscoreKB, lookup as longscoreLookup, rank as longscoreRank,
|
| 40 |
} from "./longscore.js";
|
| 41 |
import { planExtension, suggestRopeType } from "./yarn_planner.js";
|
| 42 |
+
import { listGgufFiles, fetchGgufMetadata, ggufToConfig, quantFromFilename, analyzeGguf } from "./gguf_bridge.js";
|
| 43 |
|
| 44 |
// Attach HF Hub search-as-you-type to all 5 model id inputs (Profile, Recipe,
|
| 45 |
// Unmask, Template, Quant). Hits public huggingface.co/api/models. Idempotent.
|
|
|
|
| 234 |
longscore: "longscore-section",
|
| 235 |
hub: "hub-section",
|
| 236 |
yarn: "yarn-section",
|
| 237 |
+
gguf: "gguf-section",
|
| 238 |
}[targetMode];
|
| 239 |
if (sectionId) {
|
| 240 |
const sec = document.getElementById(sectionId);
|
|
|
|
| 259 |
"diagnose-section", "phase-section", "unmask-section",
|
| 260 |
"template-section", "arena-section", "contam-section",
|
| 261 |
"quant-section", "drift-section", "niah-section",
|
| 262 |
+
"saturation-section", "cot-section", "peft-section", "cache-section", "speculative-section", "tax-section", "longscore-section", "hub-section", "yarn-section", "gguf-section"].forEach(id => {
|
| 263 |
const el = $(id);
|
| 264 |
if (el) el.style.display = "none";
|
| 265 |
});
|
|
|
|
| 279 |
longscore: "longscore-section",
|
| 280 |
hub: "hub-section",
|
| 281 |
yarn: "yarn-section",
|
| 282 |
+
gguf: "gguf-section",
|
| 283 |
};
|
| 284 |
const sectionId = sectionMap[mode];
|
| 285 |
if (sectionId) $(sectionId).style.display = "";
|
|
|
|
| 294 |
if (mode === "longscore") initLongscore();
|
| 295 |
if (mode === "hub") initHub();
|
| 296 |
if (mode === "yarn") initYarn();
|
| 297 |
+
if (mode === "gguf") initGguf();
|
| 298 |
});
|
| 299 |
});
|
| 300 |
|
|
|
|
| 4665 |
});
|
| 4666 |
}
|
| 4667 |
|
| 4668 |
+
// Context / horizon lengths: binary-K so 32768→32K, 131072→128K, 8192→8K
|
| 4669 |
+
// (the convention everyone uses for context windows), not decimal-K (→33K).
|
| 4670 |
function _yarnFmtK(n) {
|
| 4671 |
if (n == null || !Number.isFinite(n)) return "—";
|
| 4672 |
+
if (n >= 1048576) return (n / 1048576).toFixed(1) + "M";
|
| 4673 |
+
if (n >= 1024) return Math.round(n / 1024) + "K";
|
| 4674 |
+
return String(Math.round(n));
|
| 4675 |
+
}
|
| 4676 |
+
// RoPE θ is an arbitrary base, not a power of two → decimal M/K reads naturally
|
| 4677 |
+
// (1000000→1M, 500000→500K, 40000→40K).
|
| 4678 |
+
function _thetaFmt(n) {
|
| 4679 |
+
if (n == null || !Number.isFinite(n)) return "—";
|
| 4680 |
+
if (n >= 1e6) return (n / 1e6).toFixed(n % 1e6 === 0 ? 0 : 1) + "M";
|
| 4681 |
+
if (n >= 1000) return (n / 1000).toFixed(n % 1000 === 0 ? 0 : 1) + "K";
|
| 4682 |
return String(Math.round(n));
|
| 4683 |
}
|
| 4684 |
function _yarnFmtG(g) {
|
|
|
|
| 4735 |
<tr><td style="${td}">${t("yarn.r.method")}</td><td><code>${p.ropeType}</code></td></tr>
|
| 4736 |
<tr><td style="${td}">γ ${t("yarn.r.naive")}</td><td>${_yarnFmtG(p.gammaNaive)}${p.gammaNaive <= 0 ? ` 🚨 ${t("yarn.r.collapsed")}` : ""}</td></tr>
|
| 4737 |
<tr><td style="${td}">γ ${t("yarn.r.eff")}</td><td><strong>${_yarnFmtG(p.gammaEff)}</strong></td></tr>
|
| 4738 |
+
<tr><td style="${td}">θ_eff</td><td>${_thetaFmt(p.thetaEff)}${p.thetaEff > p.theta ? ` (↑ ${t("yarn.r.from")} ${_thetaFmt(p.theta)})` : ""}</td></tr>
|
| 4739 |
<tr><td style="${td}">d_horizon ${t("yarn.r.eff")}</td><td>${_yarnFmtK(p.dHorizonEff)} ${horizonOk ? "✅ ≥ L" : "⚠ < L"}</td></tr>
|
| 4740 |
</table>
|
| 4741 |
<h3>${t("yarn.r.snippet")}</h3>
|
|
|
|
| 4751 |
});
|
| 4752 |
}
|
| 4753 |
|
| 4754 |
+
// ════════════════════════════════════════════════════════════════════
|
| 4755 |
+
// 🧊 GGUF Validity Bridge (v0.9.1)
|
| 4756 |
+
// ════════════════════════════════════════════════════════════════════
|
| 4757 |
+
let _ggufWired = false;
|
| 4758 |
+
let _ggufFiles = [];
|
| 4759 |
+
let _ggufCfgCache = {}; // "repo|file" → ggufToConfig result (geometry is shared across quants)
|
| 4760 |
+
|
| 4761 |
+
// Parse a .gguf header once and cache. The architecture/θ/context/head geometry
|
| 4762 |
+
// is identical across every quant of the same model — only the quant scheme
|
| 4763 |
+
// differs — so one parsed file is enough to score the whole repo.
|
| 4764 |
+
async function ggufGetCfg(repo, file) {
|
| 4765 |
+
const key = `${repo}|${file}`;
|
| 4766 |
+
if (_ggufCfgCache[key]) return _ggufCfgCache[key];
|
| 4767 |
+
const url = `https://huggingface.co/${repo}/resolve/main/${file}`;
|
| 4768 |
+
const meta = await fetchGgufMetadata(url);
|
| 4769 |
+
const cfg = ggufToConfig(meta);
|
| 4770 |
+
if (!cfg.quant_scheme) {
|
| 4771 |
+
const q = quantFromFilename(file);
|
| 4772 |
+
cfg.quant_label = cfg.quant_label || q.label;
|
| 4773 |
+
cfg.quant_scheme = q.scheme;
|
| 4774 |
+
}
|
| 4775 |
+
cfg.__bytesRead = meta.bytesRead;
|
| 4776 |
+
_ggufCfgCache[key] = cfg;
|
| 4777 |
+
return cfg;
|
| 4778 |
+
}
|
| 4779 |
+
|
| 4780 |
+
function initGguf() {
|
| 4781 |
+
if (_ggufWired) return;
|
| 4782 |
+
_ggufWired = true;
|
| 4783 |
+
|
| 4784 |
+
const listBtn = $("gguf-list-btn");
|
| 4785 |
+
const analyzeBtn = $("gguf-analyze-btn");
|
| 4786 |
+
const allBtn = $("gguf-all-btn");
|
| 4787 |
+
const fileSel = $("gguf-file");
|
| 4788 |
+
|
| 4789 |
+
listBtn?.addEventListener("click", async () => {
|
| 4790 |
+
const repo = ($("gguf-repo").value || "").trim();
|
| 4791 |
+
if (!repo) { $("gguf-status").textContent = "⚠ " + t("gguf.need_repo"); return; }
|
| 4792 |
+
$("gguf-status").textContent = "⏳ " + t("gguf.listing");
|
| 4793 |
+
listBtn.disabled = true;
|
| 4794 |
+
state.lastModelId = repo;
|
| 4795 |
+
try {
|
| 4796 |
+
const files = await listGgufFiles(repo);
|
| 4797 |
+
if (!files.length) { $("gguf-status").textContent = "⚠ " + t("gguf.no_files"); fileSel.disabled = true; analyzeBtn.disabled = true; return; }
|
| 4798 |
+
fileSel.innerHTML = files.map(f => `<option value="${escapeHtml(f)}">${escapeHtml(f)}</option>`).join("");
|
| 4799 |
+
// Default-select a Q4_K_M (the community sweet spot) if present.
|
| 4800 |
+
const def = files.find(f => /q4_k_m/i.test(f)) || files[0];
|
| 4801 |
+
fileSel.value = def;
|
| 4802 |
+
fileSel.disabled = false;
|
| 4803 |
+
analyzeBtn.disabled = false;
|
| 4804 |
+
$("gguf-all-btn").disabled = false;
|
| 4805 |
+
_ggufFiles = files;
|
| 4806 |
+
$("gguf-status").innerHTML = `✅ ${files.length} ${t("gguf.found")} — ${t("gguf.pick_hint")}`;
|
| 4807 |
+
} catch (err) {
|
| 4808 |
+
$("gguf-status").textContent = `❌ ${err.message}`;
|
| 4809 |
+
} finally {
|
| 4810 |
+
listBtn.disabled = false;
|
| 4811 |
+
}
|
| 4812 |
+
});
|
| 4813 |
+
|
| 4814 |
+
analyzeBtn?.addEventListener("click", async () => {
|
| 4815 |
+
const repo = ($("gguf-repo").value || "").trim();
|
| 4816 |
+
const file = fileSel.value;
|
| 4817 |
+
if (!repo || !file) return;
|
| 4818 |
+
$("gguf-status").textContent = "⏳ " + t("gguf.reading");
|
| 4819 |
+
analyzeBtn.disabled = true;
|
| 4820 |
+
try {
|
| 4821 |
+
const cfg = await ggufGetCfg(repo, file);
|
| 4822 |
+
const target = parseFloat($("gguf-target").value) || null;
|
| 4823 |
+
const result = analyzeGguf(cfg, target);
|
| 4824 |
+
$("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
|
| 4825 |
+
renderGgufResult(cfg, result);
|
| 4826 |
+
} catch (err) {
|
| 4827 |
+
$("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
|
| 4828 |
+
} finally {
|
| 4829 |
+
analyzeBtn.disabled = false;
|
| 4830 |
+
}
|
| 4831 |
+
});
|
| 4832 |
+
|
| 4833 |
+
allBtn?.addEventListener("click", async () => {
|
| 4834 |
+
const repo = ($("gguf-repo").value || "").trim();
|
| 4835 |
+
const file = fileSel.value;
|
| 4836 |
+
if (!repo || !file) return;
|
| 4837 |
+
$("gguf-status").textContent = "⏳ " + t("gguf.reading");
|
| 4838 |
+
allBtn.disabled = true; analyzeBtn.disabled = true;
|
| 4839 |
+
try {
|
| 4840 |
+
// One header parse gives the shared geometry; score every quant from it.
|
| 4841 |
+
const cfg = await ggufGetCfg(repo, file);
|
| 4842 |
+
const target = parseFloat($("gguf-target").value) || null;
|
| 4843 |
+
// Dedupe repo files to one row per quant label (drop shard suffixes).
|
| 4844 |
+
const seen = new Set();
|
| 4845 |
+
const rows = [];
|
| 4846 |
+
for (const f of _ggufFiles) {
|
| 4847 |
+
const q = quantFromFilename(f);
|
| 4848 |
+
if (q.label === "?" || seen.has(q.label)) continue;
|
| 4849 |
+
seen.add(q.label);
|
| 4850 |
+
const res = analyzeGguf({ ...cfg, quant_label: q.label, quant_scheme: q.scheme }, target);
|
| 4851 |
+
rows.push({ label: q.label, scheme: q.scheme, res });
|
| 4852 |
+
}
|
| 4853 |
+
// Best precision first: lowest γ-shift (baseline F16 = 0) at the top.
|
| 4854 |
+
rows.sort((a, b) => (a.res.quant?.gamma_shift ?? 0) - (b.res.quant?.gamma_shift ?? 0));
|
| 4855 |
+
$("gguf-status").innerHTML = `✅ ${t("gguf.read_ok")} (${(cfg.__bytesRead / 1024 / 1024).toFixed(1)} MB header)`;
|
| 4856 |
+
renderGgufComparison(cfg, rows);
|
| 4857 |
+
} catch (err) {
|
| 4858 |
+
$("gguf-status").textContent = `❌ ${ggufErrMsg(err)}`;
|
| 4859 |
+
} finally {
|
| 4860 |
+
allBtn.disabled = false; analyzeBtn.disabled = false;
|
| 4861 |
+
}
|
| 4862 |
+
});
|
| 4863 |
+
}
|
| 4864 |
+
|
| 4865 |
+
function ggufErrMsg(err) {
|
| 4866 |
+
return ({
|
| 4867 |
+
not_a_gguf_file: t("gguf.err.not_gguf"),
|
| 4868 |
+
gguf_metadata_too_large: t("gguf.err.too_large"),
|
| 4869 |
+
})[err.message] || err.message;
|
| 4870 |
+
}
|
| 4871 |
+
|
| 4872 |
+
function renderGgufResult(cfg, r) {
|
| 4873 |
+
const out = $("gguf-output");
|
| 4874 |
+
if (!out) return;
|
| 4875 |
+
out.style.display = "";
|
| 4876 |
+
|
| 4877 |
+
if (r.verdict === "incomplete") {
|
| 4878 |
+
out.innerHTML = `<div class="gc-validity-warning">⚠ ${t("gguf.err.incomplete")}</div>`;
|
| 4879 |
+
return;
|
| 4880 |
+
}
|
| 4881 |
+
|
| 4882 |
+
const meta = ({
|
| 4883 |
+
healthy: { emoji: "✅", cls: "v-yes" },
|
| 4884 |
+
usable_with_care: { emoji: "⚠️", cls: "v-deg" },
|
| 4885 |
+
degrades: { emoji: "🚨", cls: "v-no" },
|
| 4886 |
+
})[r.verdict] || { emoji: "❓", cls: "v-deg" };
|
| 4887 |
+
|
| 4888 |
+
const td = "padding:3px 12px 3px 0;";
|
| 4889 |
+
const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
|
| 4890 |
+
? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
|
| 4891 |
+
|
| 4892 |
+
// Quant block (may be null for F16/F32 files).
|
| 4893 |
+
let quantHtml = "";
|
| 4894 |
+
if (r.quant) {
|
| 4895 |
+
const regimeEmoji = ({ safe: "✅", mild: "🟡", significant: "🟠", cliff: "🚨" })[r.quant.regime] || "";
|
| 4896 |
+
const dp = r.quant.delta_ppl;
|
| 4897 |
+
quantHtml = `
|
| 4898 |
+
<tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "?"}</code></td></tr>
|
| 4899 |
+
<tr><td style="${td}">${t("gguf.r.gamma_shift")}</td><td>−${_yarnFmtG(r.quant.gamma_shift)} ${regimeEmoji} <span class="subtle">${t("quant.regime." + r.quant.regime) || r.quant.regime}</span></td></tr>
|
| 4900 |
+
<tr><td style="${td}">ΔPPL</td><td>≈ +${dp.mid} <span class="subtle">(${dp.low}–${dp.high})</span></td></tr>`;
|
| 4901 |
+
} else {
|
| 4902 |
+
quantHtml = `<tr><td style="${td}">${t("gguf.r.quant")}</td><td><code>${r.quantLabel || "F16/F32"}</code> <span class="subtle">${t("gguf.r.no_quant_shift")}</span></td></tr>`;
|
| 4903 |
+
}
|
| 4904 |
+
|
| 4905 |
+
out.innerHTML = `
|
| 4906 |
+
<p><span class="verdict-badge ${meta.cls}">${meta.emoji} ${t("gguf.verdict." + r.verdict)}</span></p>
|
| 4907 |
+
<table style="border-collapse:collapse;font-size:0.95em;margin:0.5em 0;">
|
| 4908 |
+
<tr><td style="${td}">${t("gguf.r.arch")}</td><td><code>${escapeHtml(r.arch)}</code> · ${gqa} · θ=${_thetaFmt(r.theta)}</td></tr>
|
| 4909 |
+
<tr><td style="${td}">${t("gguf.r.ctx_train")}</td><td>${_yarnFmtK(r.nCtx)}</td></tr>
|
| 4910 |
+
<tr><td style="${td}">${t("gguf.r.horizon_fp16")}</td><td>${_yarnFmtK(r.dHoriz)} <span class="subtle">(γ=${_yarnFmtG(r.gammaTrain)})</span></td></tr>
|
| 4911 |
+
${quantHtml}
|
| 4912 |
+
<tr><td style="${td}"><strong>γ @ L=${_yarnFmtK(r.L)}</strong> ${t("gguf.r.after_quant")}</td><td><strong>${_yarnFmtG(r.gammaQuant)}</strong> <span class="subtle">(fp16: ${_yarnFmtG(r.gammaAtL)})</span></td></tr>
|
| 4913 |
+
</table>
|
| 4914 |
+
<p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
|
| 4915 |
+
}
|
| 4916 |
+
|
| 4917 |
+
function renderGgufComparison(cfg, rows) {
|
| 4918 |
+
const out = $("gguf-output");
|
| 4919 |
+
if (!out) return;
|
| 4920 |
+
out.style.display = "";
|
| 4921 |
+
const gqa = (cfg.num_attention_heads && cfg.num_key_value_heads && cfg.num_key_value_heads < cfg.num_attention_heads)
|
| 4922 |
+
? `GQA ${cfg.num_attention_heads}:${cfg.num_key_value_heads}` : "MHA";
|
| 4923 |
+
// Short verdict label = the word before the em-dash of the full verdict string
|
| 4924 |
+
// (works in every language: "HEALTHY — …", "SANO — …", "健康 —— …").
|
| 4925 |
+
const short = v => (t("gguf.verdict." + v) || v).split(/——|—| - /)[0].trim();
|
| 4926 |
+
const emo = v => ({ healthy: "✅", usable_with_care: "⚠️", degrades: "🚨" })[v] || "❓";
|
| 4927 |
+
const td = "padding:3px 14px 3px 0;";
|
| 4928 |
+
const head = `<tr style="text-align:left;border-bottom:1px solid var(--border);">
|
| 4929 |
+
<th style="${td}">${t("gguf.r.quant")}</th><th style="${td}">${t("gguf.r.gamma_shift")}</th>
|
| 4930 |
+
<th style="${td}">${t("gguf.col.gamma_at_l")}</th><th style="${td}">${t("gguf.col.verdict")}</th></tr>`;
|
| 4931 |
+
const body = rows.map(({ label, res }) => {
|
| 4932 |
+
const shift = res.quant ? "−" + _yarnFmtG(res.quant.gamma_shift) : "—";
|
| 4933 |
+
return `<tr><td style="${td}"><code>${escapeHtml(label)}</code></td><td style="${td}">${shift}</td>
|
| 4934 |
+
<td style="${td}">${_yarnFmtG(res.gammaQuant)}</td>
|
| 4935 |
+
<td style="${td}">${emo(res.verdict)} ${short(res.verdict)}</td></tr>`;
|
| 4936 |
+
}).join("");
|
| 4937 |
+
// d_horizon is θ-set → identical for every quant; show it once in the header line.
|
| 4938 |
+
out.innerHTML = `<h3>${t("gguf.compare_title")}</h3>
|
| 4939 |
+
<p class="subtle">${escapeHtml(cfg.architecture)} · ${gqa} · θ=${_thetaFmt(cfg.rope_theta)} · ctx ${_yarnFmtK(cfg.context_length)} · horizon ${_yarnFmtK(rows[0]?.res.dHoriz)} · L=${_yarnFmtK(rows[0]?.res.L)}</p>
|
| 4940 |
+
<table style="border-collapse:collapse;font-size:0.93em;">${head}${body}</table>
|
| 4941 |
+
<p class="subtle" style="font-size:0.88em;">${t("gguf.r.note")}</p>`;
|
| 4942 |
+
}
|
| 4943 |
+
|
| 4944 |
// ════════════════════════════════════════════════════════════════════
|
| 4945 |
// Bootstrap
|
| 4946 |
// ════════════════════════════════════════════════════════════════════
|
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { chromium } from "playwright";
|
| 2 |
+
const BASE = "http://127.0.0.1:8000/index.html";
|
| 3 |
+
const b = await chromium.launch({ headless: true });
|
| 4 |
+
const p = await (await b.newContext()).newPage();
|
| 5 |
+
const errors = [];
|
| 6 |
+
const benign = s => /Failed to load resource.*40\d|status of 40\d/.test(s);
|
| 7 |
+
p.on("console", m => { if (m.type()==="error" && !benign(m.text())) errors.push(`[err] ${m.text()}`); });
|
| 8 |
+
p.on("pageerror", e => errors.push(`[pageerror] ${e.message}`));
|
| 9 |
+
const log = s => process.stdout.write(s+"\n");
|
| 10 |
+
let pass=0, fail=0;
|
| 11 |
+
const check=(n,c,x="")=>{ log(`${c?" OK ":" FAIL"} ${n} ${x}`); c?pass++:fail++; };
|
| 12 |
+
|
| 13 |
+
await p.goto(BASE,{waitUntil:"domcontentloaded",timeout:90000});
|
| 14 |
+
await p.waitForTimeout(2500);
|
| 15 |
+
await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
|
| 16 |
+
check("module loads, 0 errors", errors.length===0, `(errors=${errors.length})`);
|
| 17 |
+
|
| 18 |
+
await p.click('[data-mode-link="gguf"]',{timeout:5000}); await p.waitForTimeout(500);
|
| 19 |
+
const secVis = await p.evaluate(()=>{const s=document.querySelector("#gguf-section");return s&&getComputedStyle(s).display!=="none";});
|
| 20 |
+
check("gguf-section visible after tile click", secVis);
|
| 21 |
+
|
| 22 |
+
log("\n── List quant files (real repo) ──");
|
| 23 |
+
await p.fill("#gguf-repo","Qwen/Qwen2.5-0.5B-Instruct-GGUF");
|
| 24 |
+
await p.click("#gguf-list-btn");
|
| 25 |
+
await p.waitForTimeout(4000);
|
| 26 |
+
const listed = await p.evaluate(()=>{
|
| 27 |
+
const sel=document.querySelector("#gguf-file");
|
| 28 |
+
return { count:sel.options.length, selected:sel.value, disabled:sel.disabled,
|
| 29 |
+
analyzeEnabled:!document.querySelector("#gguf-analyze-btn").disabled,
|
| 30 |
+
status:document.querySelector("#gguf-status").innerText.slice(0,60) };
|
| 31 |
+
});
|
| 32 |
+
check("files listed in dropdown", listed.count>0, `(${listed.count} files)`);
|
| 33 |
+
check("Q4_K_M auto-selected", /q4_k_m/i.test(listed.selected), listed.selected);
|
| 34 |
+
check("analyze button enabled", listed.analyzeEnabled);
|
| 35 |
+
|
| 36 |
+
log("\n── Analyze GGUF (parse header + verdict) ──");
|
| 37 |
+
await p.click("#gguf-analyze-btn");
|
| 38 |
+
await p.waitForTimeout(8000); // range fetch + parse
|
| 39 |
+
const r = await p.evaluate(()=>{
|
| 40 |
+
const o=document.querySelector("#gguf-output");
|
| 41 |
+
return { vis:getComputedStyle(o).display!=="none",
|
| 42 |
+
verdict:o.querySelector(".verdict-badge")?.innerText?.trim()||"",
|
| 43 |
+
text:o.innerText,
|
| 44 |
+
status:document.querySelector("#gguf-status").innerText };
|
| 45 |
+
});
|
| 46 |
+
check("output rendered", r.vis && r.text.length>50);
|
| 47 |
+
check("verdict present", r.verdict.length>3, r.verdict);
|
| 48 |
+
check("shows architecture qwen2", /qwen2/.test(r.text));
|
| 49 |
+
check("shows trained context 32K", /32K|32768/.test(r.text), (r.text.match(/Trained context[^\n]*\n?\s*[\w.]+/)||[""])[0].slice(0,40));
|
| 50 |
+
check("shows quant Q4_K_M", /Q4_K_M/i.test(r.text));
|
| 51 |
+
check("shows γ-shift from quant", /γ-shift|shift/i.test(r.text));
|
| 52 |
+
check("shows ΔPPL", /ΔPPL|PPL/.test(r.text));
|
| 53 |
+
check("header parsed status (MB)", /MB header|parsed|analizada|analysé|已解析/i.test(r.status), r.status.slice(0,50));
|
| 54 |
+
|
| 55 |
+
log("\n── Target L override ──");
|
| 56 |
+
await p.fill("#gguf-target","131072");
|
| 57 |
+
await p.click("#gguf-analyze-btn");
|
| 58 |
+
await p.waitForTimeout(7000);
|
| 59 |
+
const r2 = await p.evaluate(()=>document.querySelector("#gguf-output .verdict-badge")?.innerText?.trim());
|
| 60 |
+
check("re-analyze with L=131072", r2.length>3, r2);
|
| 61 |
+
|
| 62 |
+
log("\n── Compare all quants (one header parse → full table) ──");
|
| 63 |
+
await p.click("#gguf-all-btn");
|
| 64 |
+
await p.waitForTimeout(7000);
|
| 65 |
+
const cmp = await p.evaluate(()=>{
|
| 66 |
+
const o=document.querySelector("#gguf-output");
|
| 67 |
+
const rows=[...o.querySelectorAll("table tr")];
|
| 68 |
+
const dataRows=rows.slice(1); // minus header
|
| 69 |
+
return { title:o.querySelector("h3")?.innerText,
|
| 70 |
+
rowCount:dataRows.length,
|
| 71 |
+
quants:dataRows.map(r=>r.querySelector("code")?.innerText).filter(Boolean),
|
| 72 |
+
hasShift:/−0\.|—/.test(o.innerText),
|
| 73 |
+
hasVerdictCol:rows[0]?.innerText?.includes("Verdict") };
|
| 74 |
+
});
|
| 75 |
+
check("comparison table rendered", cmp.rowCount>=3, `(${cmp.rowCount} rows)`);
|
| 76 |
+
check("lists multiple quant labels", cmp.quants.length>=3, cmp.quants.join(", "));
|
| 77 |
+
check("has verdict column", cmp.hasVerdictCol, cmp.title);
|
| 78 |
+
check("rows sorted best→worst (Q8 before Q2)", (()=>{
|
| 79 |
+
const i8=cmp.quants.findIndex(q=>/Q8/.test(q)), i2=cmp.quants.findIndex(q=>/Q2/.test(q));
|
| 80 |
+
return i8<0||i2<0||i8<i2;})(), cmp.quants.join(" > "));
|
| 81 |
+
// Verdicts must vary across quants (regression guard: a hard d_horizon gate
|
| 82 |
+
// once forced every row to DEGRADES even when γ@L was healthy).
|
| 83 |
+
const verdicts = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>r.lastElementChild?.innerText?.trim()));
|
| 84 |
+
check("verdicts vary across quants (not all identical)", new Set(verdicts).size>=2, verdicts.join(" | "));
|
| 85 |
+
// γ@L must DECREASE for worse quants (Q8 γ@L > Q2 γ@L).
|
| 86 |
+
const gammas = await p.evaluate(()=>[...document.querySelectorAll("#gguf-output table tr")].slice(1).map(r=>parseFloat(r.children[2]?.innerText)));
|
| 87 |
+
check("γ@L decreases for worse quant", gammas[0] > gammas[gammas.length-1], `${gammas[0]} → ${gammas[gammas.length-1]}`);
|
| 88 |
+
|
| 89 |
+
log("\n── 4-language verdict ──");
|
| 90 |
+
for (const lang of ["es","fr","zh","en"]) {
|
| 91 |
+
await p.click(`.lang-btn[data-lang="${lang}"]`); await p.waitForTimeout(300);
|
| 92 |
+
const label = await p.evaluate(()=>document.querySelector('.mode-btn[data-mode="gguf"]')?.textContent?.trim());
|
| 93 |
+
check(`${lang}: tab label localized`, label && label.length>3, label);
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
log("\n── Error path: bad repo ──");
|
| 97 |
+
await p.click(`.lang-btn[data-lang="en"]`); await p.waitForTimeout(200);
|
| 98 |
+
await p.fill("#gguf-repo","this/definitely-not-a-real-repo-xyz123");
|
| 99 |
+
await p.click("#gguf-list-btn");
|
| 100 |
+
await p.waitForTimeout(3000);
|
| 101 |
+
const errStatus = await p.evaluate(()=>document.querySelector("#gguf-status").innerText);
|
| 102 |
+
check("bad repo → error message", /❌|not found|HTTP/i.test(errStatus), errStatus.slice(0,50));
|
| 103 |
+
|
| 104 |
+
log(`\n=== ${pass} passed, ${fail} failed · JS errors: ${errors.length} ===`);
|
| 105 |
+
errors.slice(0,10).forEach(e=>log(e));
|
| 106 |
+
await b.close();
|
| 107 |
+
process.exit(fail>0?1:0);
|