Football2Vec β L2 Harvest (Research Artifact)
Repository collecting seed-program evaluation outputs from the OpenEvolve Level-2 architecture-evolution cycles on the Football2Vec v2 target. Each harvest run uploads a metrics.json capturing a candidate architecture's evaluated fitness against the production Football2Vec v2 baseline.
Part of the (Right! Luxury!) Lakehouse soccer analytics platform.
Status
- Status: Research artifact β L2 evolve harvest outputs, not a trained deployable model
- Canonical model:
luxury-lakehouse/football2vec-v2 - Retained for: Audit trail of evolve-engine seed evaluations; reproduction of promotion decisions
What L2 Harvesting Is
The evolve engine's Level-2 mode proposes new architectural vocabularies (custom attention kernels, position encodings, adversary schedules, etc.) as Python programs. Each proposal is evaluated on HF Jobs L40S GPUs and the resulting metrics are uploaded to this repo for audit. Seeds that beat the baseline by a pre-registered threshold are candidates for promotion to a full training run; the rest are archived here for transparency.
Repo Layout
<variant-name>/metrics.jsonβ per-variant evaluation metrics (val MLM loss, adversary accuracy, debias score, mlm score, fitness, param count, wall-clock). Variant names match the seed program filename (e.g.cross_attention_adversary/metrics.jsonfromcross_attention_adversary.py).results.jsonβ combined harvest result with shared_config provenance, dataset SHA, stage-1 SHA, fitness formula, baselineL_0, and the full sorted variant list. Overwritten by each orchestrator run; the per-variant files are the authoritative per-variant records.
Publishing Provenance
Harvest uploads are written by scripts/evaluate_football2vec_l2_adversary_seeds.py (the multi-backend orchestrator) after each seed's evaluation completes on the configured compute backend (AI-PC local CUDA, Media-PC or DGX Spark via SSH, or HF Jobs L40S). The README itself (this card) is pushed separately via scripts/publish_hf_cards.py --name football2vec-l2-harvest.md --kind model per ADR-014 β an orphan card with no training-script publisher. See the EV2 cycle documentation at docs/evolve/ev2-football2vec-l2-adversarial/SUMMARY.md and docs/engineering/orchestration.md for the debugging history and the rules that govern future cycles.
EV2 Phase 1 β Adversary Architecture Harvest (2026-04-23)
First L2 cycle for Football2Vec v2, targeting the adversary-architecture axis. Six architecturally distinct adversary seeds + the linear baseline, all on stage-1 SHA bf102a57c9575cbfddf7661ba7a3ebe29de3c124 and dataset SHA 5eb1bfc3be549c56fc1256936aa53fd7f2393d8f. Fitness formula: 0.4 Γ mlm_score + 0.6 Γ debias_score. Baseline L_0 = 0.7413. WIN threshold: fitness β₯ baseline + 0.02 = 0.960.
| Variant | val_mlm | val_adv_acc | debias | mlm_score | fitness | Disposition |
|---|---|---|---|---|---|---|
linear (baseline) |
0.7413 | 0.1411 | 0.900 | 1.000 | 0.940 | baseline |
cross_attention_adversary |
0.8817 | 0.0973 | 0.946 | 0.841 | 0.904 | ARCHIVE |
residual_mlp |
0.7491 | 0.2116 | 0.826 | 0.990 | 0.891 | ARCHIVE |
deep_mlp_3layer |
0.7732 | 0.2116 | 0.826 | 0.959 | 0.879 | ARCHIVE |
dual_head_ensemble |
0.7802 | 0.2116 | 0.826 | 0.950 | 0.876 | ARCHIVE |
deep_mlp_2layer |
0.7839 | 0.2116 | 0.826 | 0.946 | 0.874 | ARCHIVE |
attention_pool_head |
0.9479 (ep6/30) | 0.1796 (ep6/30) | 0.859 (ep6/30) | 0.782 (ep6/30) | 0.828 (ep6/30) | ARCHIVE β interrupted |
Outcome: no promotions. The linear adversary baseline Pareto-dominates or ties every tested architecture on the combined fitness metric. cross_attention_adversary is the single variant that improves debias (0.946 vs baseline 0.900) AND reduces val_adv_accuracy (0.0973 vs 0.1411), but pays an MLM cost that drags net fitness below threshold. Flagged for possible Phase 2 mechanism probe.
attention_pool_head disposition. Phase 1f on AI-PC (LocalCudaBackend, no wall-clock timeout) was interrupted by an unapproved Windows auto-restart at Epoch 6/30 (2026-04-24 11:50, PID 71172 killed mid-training; no checkpoint/resume wired). The Epoch 1-6 trajectory shows no convergence toward the WIN threshold: val_mlm oscillates 0.92-1.07 with no downward trend toward the 0.7413 baseline, and val_adv_acc oscillates 0.01-0.20 without settling at a low-leakage equilibrium. Interim fitness 0.828 (Epoch 6 snapshot) is below every completed non-baseline variant. ARCHIVE by trajectory; no re-run (~16h of additional compute for a confirmed-ARCHIVE result). Full per-epoch log preserved in-repo at docs/evolve/ev2-football2vec-l2-adversarial/phase1f.log.
Full narrative (including the Phase 1a-1e orchestration debugging that consumed ~4 h of active investigation across 5 sequential re-fires) in SUMMARY.md.
Use Cases
- Evolve-engine audit: trace which seed programs were evaluated, their scores, and when the promotion threshold was met
- Architecture research: mine the seed programs for architectural ideas that scored well but weren't promoted
EU AI Act β Intended Use and Non-Use
This repository contains research evaluations of proposed model architectures, not a trained deployable model. It is not intended for, validated for, or supplied to any Annex III Β§4 use. Any deployer who wishes to fine-tune or train a promoted seed into a production model must perform their own conformity assessment.
License
- Metrics + seed programs: CC-BY-NC 4.0 (inherited from training-data licensing)
- Seed program Python source: research-use licensed, not redistributable as a standalone library
Citation
@software{football2vec_l2_harvest_2026,
title={Football2Vec L2 Evolve Harvest},
author={Nielsen, Karsten Skyt},
year={2026},
url={https://huggingface.co/luxury-lakehouse/football2vec-l2-harvest}
}