---
license: mit
tags:
- attention
- positional-encoding
- efficient-attention
- number-theory
- calabi-yau
- negative-result
datasets:
- callensxavier/cy-sieve-attention-benchmark
---

# CY-Sieve Attention Kernel — a falsifiable experiment (NEGATIVE result)

A positional-attention bias derived from the weight-5 Apéry-like binomial sum

$$S_{20}(n) = \sum_{k=0}^{n} \binom{n}{k}^4 \binom{n+k}{k}$$

whose holonomic structure is a Calabi–Yau **3-fold** period (an order-4 MUM block;
the geometry fixes the long-range decay slope $\log\lambda = 3.762$ and curvature
$\beta = 2$). The engineering bet: generate the positional bias *on the fly* from
the order-4 recurrence, costing **O(L)** HBM instead of an **O(L²)** table.

> **Code:** [GitHub — Mirror-Map-Sieve](https://github.com/xaviercallens/Mirror-Map-Sieve)
> · **Math paper:** [Zenodo 10.5281/zenodo.20747943](https://doi.org/10.5281/zenodo.20747943)
> · **Benchmark data:** [callensxavier/cy-sieve-attention-benchmark](https://huggingface.co/datasets/callensxavier/cy-sieve-attention-benchmark)

## ⚠️ Honest result (2026-06-22, NVIDIA L4): the quality gate KILLED it

This card **corrects** an earlier version that claimed "zero overhead" and implied
a win. Those numbers came from an invalid method (monkey-patching the bias into a
*frozen* pretrained model, which collapses every alternative scheme equally). The
correct test — **training small GPTs from scratch**, one per positional scheme, on
real WikiText-2 — gives:

| scheme | ppl @512 (train) | @1024 (2×) | @2048 (4×) |
|---|---|---|---|
| **learned-absolute** | **4.22** | 12.10 | 20.82 |
| ALiBi | 10.74 | 11.73 | 11.35 |
| **sliding-window** | **4.99** | **5.07** | **5.03** |
| CY-Sieve τ-ladder | 11.33 | 12.31 | 12.05 |
| CY-Sieve τ=128 | 6.80 | 7.12 | 7.00 |
| CY-Sieve τ=512 | 4.65 | 6.08 | 10.62 |

**Verdict: KILL (+10.15%).** Best CY-Sieve (4.65) vs best baseline (4.22) exceeds
the pre-committed >5% kill threshold; a plain **sliding window won outright**. The
geometry-fixed slope is too steep for a drop-in positional scheme.

## What *did* hold up

- **§4 kernel correctness — PASS.** The Triton kernel matches the NumPy reference
  within FP16 tolerance (4/4 tests).
- **§6 memory claim — confirmed.** The on-the-fly bias reads **O(L)** bytes of HBM
  vs **O(L²)** for a materialized table (**8192× less at L=16384**). *But* the
  current unfused kernel is **~4–6× slower** than fused dense SDPA — a
  memory-traffic win, **not** a latency win. Per the project's reporting rule,
  with the quality gate failing, these numbers are **not** a contribution.

## Why publish a negative result?

Because a fast, correct kernel that degrades model quality is a *failed* kernel,
and saying so is the point. The Calabi–Yau geometry is a sound **prior** for the
bias shape, not the right **value** — pinning the attention slope to the
sequence's growth rate was the mistake. Redesign directions (learnable
geometry-initialized slope; exact-local-window + gentle-tail hybrid; β=2 ablation)
are in the [findings writeup](https://github.com/xaviercallens/Mirror-Map-Sieve/blob/main/docs/PHASE3_CYSIEVE_GPU_FINDINGS.md).

## Reproduce

```bash
git clone https://github.com/xaviercallens/Mirror-Map-Sieve.git
cd Mirror-Map-Sieve
pip install -r 4_ai_hardware_attention/requirements-gpu.txt
python 4_ai_hardware_attention/run_gpu_phase.py   # §4 parity + §5 quality + §6 perf
```

## Citation

```bibtex
@misc{callens2026cysieve,
  author = {Callens, Xavier},
  title  = {CY-Sieve Attention: a Calabi--Yau positional bias and its negative quality result},
  year   = {2026},
  url    = {https://huggingface.co/callensxavier/s20-attention-kernel},
  doi    = {10.5281/zenodo.20747943}
}
```