# Round 5 — 3B Anchor-Count Scaling Sweep

**Repo:** https://huggingface.co/CK0607/cross-model-lora-prediction-3b
**Models:** X=`Qwen/Qwen2.5-3B-Instruct` → Y=`meta-llama/Llama-3.2-3B-Instruct`

## Recipe + anchors

LoRA recipe: r=16, alpha=32, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], epochs=3.0, train=1500, bs=8, lr=0.0002, max_seq_len=512, bf16.
New Round 5 anchors trained under `round5/X` and `round5/Y`: `['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal']`.
Drop list: `[]`.
R4 adapters were reused from the Modal volume; no R4 adapter was retrained.

## Scaling table — gap_recovered mean ± std over 5 stratified seeds

| N | mean | global_ridge | topk8_global_ridge |
|---:|---:|---:|---:|
| 4 | -0.058 ± 0.176 | 0.188 ± 0.012 | 0.188 ± 0.012 |
| 8 | -0.140 ± 0.161 | 0.201 ± 0.012 | 0.201 ± 0.012 |
| 12 | -0.062 ± 0.120 | 0.208 ± 0.008 | 0.208 ± 0.008 |
| 16 | -0.010 ± 0.088 | 0.212 ± 0.006 | 0.212 ± 0.006 |
| 24 | 0.024 ± 0.000 | 0.218 ± 0.000 | 0.216 ± 0.000 |

## Figure

![3B anchor-count scaling](figures/exp_scaling_3b.png)

## Interpretation

Top-K beats global_ridge at small N=4 (0.188 vs 0.188) and flattens by N=8.
Across N, the mean baseline ranges -0.140–0.024, while the best learned curve reaches 0.218 gap recovered.