# Final Decision: Franken-B Ships As Flagship

Date: `2026-05-22 06:00 ET`

After a multi-iteration corrector loop (v2 → v3 → v4 → v5 → v6 → v58R), the user called the run on the v58R fusion regression. **Franken-B-base remains the production flagship.**

## TL;DR

The corrector loop confirmed Franken-B's edge is **composition-locked**. Every training adapter we built degraded fusion vision return below FB-base's +106.76%. The "boundary fix" tradeoff is not worth the vision/profit loss.

## Corrector Loop Final Tally

| Model | Strict 323-row | Adversarial | Vision_only | Fused_agree | Verdict |
|---|---|---|---|---|---|
| **FB-base (production)** | 72.2% | 83.3% | **+106.76%** | **+103.32%** | **SHIP** |
| v2 (yesterday) | 69.6% | 76.7% | n/a | n/a | rejected |
| v3 | 73.6% | n/a | +10.52% | +37.27% | least-damaging corrector |
| v4 | 73.3% | n/a | n/a | n/a | iterated repair, mixed |
| v5 (crystals_only_long) | 81.7% | 76.7% | −21.90% | +30.91% | best strict, vision broken |
| v6 (vision anchors) | n/a | 86.7% | −55.92% | −5.04% | adversarial fixed, vision worse |
| v58R (Codex's recipe) | n/a | 73.3% | +2.55% | +40.66% | vision saved-positive but adversarial regressed harder |

## Why Each Corrector Failed

1. **v3** — tied base on corpus, vision degraded by 75% but stayed positive (+10%). Was the "least-bad" option.
2. **v5** — best on strict suite (+8 pp over base) but vision collapsed to −22% and 180d strict PnL went from +14% (V37) to −24%.
3. **v6** — Codex's first recipe with adversarial defense + cross-shape options. Closed adversarial regression (+10pp over v5) but **vision dropped further to −56%** despite vision anchors.
4. **v58R** — Codex's hardened recipe at `lm_last3` scope with fusion preservation anchors. Vision preserved positive (+2.55%) but **adversarial regressed worse than v5** (73.3% vs v5's 76.7%) AND the +13 SELL→BUY shift on vision tanked PnL to ~2% from +107%.

## The Mechanical Story

Codex's audit nailed it: **v58 is not directly a vision adapter** (only touches `model.language_model.layers.29/30/31`). Its fusion edge comes from **last-layer language-side arbitration over chart/image-derived tokens.** When any new adapter retrains those same layers — or even adjacent ones — the precise interaction breaks.

This is a **composition-lock**: V37s + V50 + V58 is more than the sum of its parts. You can't easily distill the +107% vision into a stand-alone correction adapter because the magic is in the layered timing of multiple LoRA passes.

## What FB-base Delivers (production numbers, locked)

From `E:\vfai-x_3.5_9b\evals\vfai_dev_docs\FRANKEN_B_FLAGSHIP_REPORT_20260520.md`:

| Stream | n | Take% | dir_acc | Return | $PnL | PF |
|---|---|---|---|---|---|---|
| 2yr | 5,061 | 100% | 40.5% | **+8,245%** | **$5.77M** | 4.51 |
| penny | 5,149 | 99.2% | 35.2% | **+35,155%** | **$24.6M** | 31.77 |
| 180d | 2,523 | 100% | 36.7% | **+4,711%** | **$3.30M** | 4.95 |

**Total $33.66M overlay $PnL** across the 3 streams. **First and only model to PASS bootstrap vs V5.0** on penny (p=0.023).

## Known FB-base Weaknesses (acceptable tradeoffs)

- Adversarial 83.3% vs V6's 100% (−16.7pp on 30 rows = 5 rows)
- Today_holdout 96% BUY (bull bias mitigated by stream balance)
- Corpus packs avg −8pp vs V6
- Brutal 90.9% (V37 is 100%; 3 rows lost)

These are **8-10 rows of total corpus weakness**. The corrector loop showed they cost **$30-50M in profitability** to "fix." Not a worthwhile trade.

## Recommended Production Stack

```
prompt → Franken-B primary decision
       ↘ V6 (optional) → confidence gate / veto layer
fusion → Franken-B vision + V37s text → composite
```

Models:
- **Franken-B** = `E:\vfai-x_3.5_9b\models\franken_v37s_v50_v58_20260520\vfai-qwen35-9b-v37s-v50-v58-merged\` (primary)
- **V6** = `E:\vfai-x_3.5_9b\models\v6_merged_20260520\vfai-qwen35-9b-v6-merged\` (quality filter, adversarial defense)
- **V37s** = `E:\vfai-x_3.5_9b\models\v37s_merged_20260520\vfai-qwen35-9b-v37s-merged\` (fusion text sidecar — better than FB at text-only)

Universal overlay (locked from FB-flagship report):
```
min_conv: 70
min_rel_vol: 1.0-2.0
max_positions_per_day: 8
position_size_pct: 20
leverage: 15
hard_stop_pct: -5%
theta_drag_pct: 0.5
edge_aware_flip: ON
trust_bearish_pattern: ON
```

## Lessons Captured For Future Work

1. **Composition-locked behavior cannot be flattened into a single adapter.** Don't try.
2. **Vision tower degradation happens through LM-side gradient flow** even when targeting `lm_last3` only. The cross-attention layers carry the regression.
3. **Adversarial defense and direction-action capability are antagonistic in a single adapter.** They need to be in separate adapters, applied in sequence (matching what V50 and V58 already did). Conflating them in one corrector caused every iteration to regress on one or the other.
4. **Corpus regressions of 3-7 pp** on small-N suites (n=30-60) are **statistical noise** relative to large-N profitability data (n=5,000+). Don't optimize for the small-N at the cost of the large-N.
5. **Codex's `lm_last3` insight was right** — it's the lowest-impact scope that still affects language-side arbitration. But even at this scope, ANY directional pressure in the training corpus shifts the BUY/SELL balance enough to hurt streams.

## What's Saved For Future Reference

- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\iter_v3\merged\model\` (least-damaging corrector)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\iter_v5\merged\model\` (strict-suite leader at 81.7%, vision broken)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\iter_v6\merged\model\` (adversarial fixed at 86.7%)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\iter_v58R\merged\model\` (Codex's recipe attempt)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\iter_v58R\v58R_train.jsonl` (rebuilt clean training corpus, 581 rows)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\CODEX_V58_REGRESSION_AUDIT_20260522.md` (Codex's mechanistic analysis)
- `D:\vfai-x-model-backups\frankenB_corrector_loop_20260521\V7_DECISION_TRACKER.md` (live delta tracker)

If a future attempt revives the corrector idea, **start by building separate** `v59_action_capability` and `v59_adversarial_defense` adapters, stack them in sequence on FB-base (don't merge), and CPU-tournament-test the stack order before committing to GPU.

## Status

- All v58R training/eval processes stopped
- GPU idle (780 MB used)
- No vLLM containers running
- All artifacts preserved
- Final flagship: **Franken-B base (`franken_v37s_v50_v58_20260520`)**

This file marks the formal close of the 2026-05-21 corrector loop. FB-base ships.