--- license: mit tags: - attention - positional-encoding - efficient-attention - number-theory - calabi-yau - negative-result datasets: - callensxavier/cy-sieve-attention-benchmark --- # CY-Sieve Attention Kernel — a falsifiable experiment (NEGATIVE result) A positional-attention bias derived from the weight-5 Apéry-like binomial sum $$S_{20}(n) = \sum_{k=0}^{n} \binom{n}{k}^4 \binom{n+k}{k}$$ whose holonomic structure is a Calabi–Yau **3-fold** period (an order-4 MUM block; the geometry fixes the long-range decay slope $\log\lambda = 3.762$ and curvature $\beta = 2$). The engineering bet: generate the positional bias *on the fly* from the order-4 recurrence, costing **O(L)** HBM instead of an **O(L²)** table. > **Code:** [GitHub — Mirror-Map-Sieve](https://github.com/xaviercallens/Mirror-Map-Sieve) > · **Math paper:** [Zenodo 10.5281/zenodo.20747943](https://doi.org/10.5281/zenodo.20747943) > · **Benchmark data:** [callensxavier/cy-sieve-attention-benchmark](https://huggingface.co/datasets/callensxavier/cy-sieve-attention-benchmark) ## ⚠️ Honest result (2026-06-22, NVIDIA L4): the quality gate KILLED it This card **corrects** an earlier version that claimed "zero overhead" and implied a win. Those numbers came from an invalid method (monkey-patching the bias into a *frozen* pretrained model, which collapses every alternative scheme equally). The correct test — **training small GPTs from scratch**, one per positional scheme, on real WikiText-2 — gives: | scheme | ppl @512 (train) | @1024 (2×) | @2048 (4×) | |---|---|---|---| | **learned-absolute** | **4.22** | 12.10 | 20.82 | | ALiBi | 10.74 | 11.73 | 11.35 | | **sliding-window** | **4.99** | **5.07** | **5.03** | | CY-Sieve τ-ladder | 11.33 | 12.31 | 12.05 | | CY-Sieve τ=128 | 6.80 | 7.12 | 7.00 | | CY-Sieve τ=512 | 4.65 | 6.08 | 10.62 | **Verdict: KILL (+10.15%).** Best CY-Sieve (4.65) vs best baseline (4.22) exceeds the pre-committed >5% kill threshold; a plain **sliding window won outright**. The geometry-fixed slope is too steep for a drop-in positional scheme. ## What *did* hold up - **§4 kernel correctness — PASS.** The Triton kernel matches the NumPy reference within FP16 tolerance (4/4 tests). - **§6 memory claim — confirmed.** The on-the-fly bias reads **O(L)** bytes of HBM vs **O(L²)** for a materialized table (**8192× less at L=16384**). *But* the current unfused kernel is **~4–6× slower** than fused dense SDPA — a memory-traffic win, **not** a latency win. Per the project's reporting rule, with the quality gate failing, these numbers are **not** a contribution. ## Why publish a negative result? Because a fast, correct kernel that degrades model quality is a *failed* kernel, and saying so is the point. The Calabi–Yau geometry is a sound **prior** for the bias shape, not the right **value** — pinning the attention slope to the sequence's growth rate was the mistake. Redesign directions (learnable geometry-initialized slope; exact-local-window + gentle-tail hybrid; β=2 ablation) are in the [findings writeup](https://github.com/xaviercallens/Mirror-Map-Sieve/blob/main/docs/PHASE3_CYSIEVE_GPU_FINDINGS.md). ## Reproduce ```bash git clone https://github.com/xaviercallens/Mirror-Map-Sieve.git cd Mirror-Map-Sieve pip install -r 4_ai_hardware_attention/requirements-gpu.txt python 4_ai_hardware_attention/run_gpu_phase.py # §4 parity + §5 quality + §6 perf ``` ## Citation ```bibtex @misc{callens2026cysieve, author = {Callens, Xavier}, title = {CY-Sieve Attention: a Calabi--Yau positional bias and its negative quality result}, year = {2026}, url = {https://huggingface.co/callensxavier/s20-attention-kernel}, doi = {10.5281/zenodo.20747943} } ```