Round 5 — 3B Anchor-Count Scaling Sweep
Repo: https://huggingface.co/CK0607/cross-model-lora-prediction-3b
Models: X=Qwen/Qwen2.5-3B-Instruct → Y=meta-llama/Llama-3.2-3B-Instruct
Recipe + anchors
LoRA recipe: r=16, alpha=32, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], epochs=3.0, train=1500, bs=8, lr=0.0002, max_seq_len=512, bf16.
New Round 5 anchors trained under round5/X and round5/Y: ['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal'].
Drop list: [].
R4 adapters were reused from the Modal volume; no R4 adapter was retrained.
Scaling table — gap_recovered mean ± std over 5 stratified seeds
| N | mean | global_ridge | topk8_global_ridge |
|---|---|---|---|
| 4 | -0.058 ± 0.176 | 0.188 ± 0.012 | 0.188 ± 0.012 |
| 8 | -0.140 ± 0.161 | 0.201 ± 0.012 | 0.201 ± 0.012 |
| 12 | -0.062 ± 0.120 | 0.208 ± 0.008 | 0.208 ± 0.008 |
| 16 | -0.010 ± 0.088 | 0.212 ± 0.006 | 0.212 ± 0.006 |
| 24 | 0.024 ± 0.000 | 0.218 ± 0.000 | 0.216 ± 0.000 |
Figure
Interpretation
Top-K beats global_ridge at small N=4 (0.188 vs 0.188) and flattens by N=8. Across N, the mean baseline ranges -0.140–0.024, while the best learned curve reaches 0.218 gap recovered.
