Round 5 — 3B Anchor-Count Scaling Sweep

Repo: https://huggingface.co/CK0607/cross-model-lora-prediction-3b Models: X=Qwen/Qwen2.5-3B-Instruct → Y=meta-llama/Llama-3.2-3B-Instruct

Recipe + anchors

LoRA recipe: r=16, alpha=32, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], epochs=3.0, train=1500, bs=8, lr=0.0002, max_seq_len=512, bf16. New Round 5 anchors trained under round5/X and round5/Y: ['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal']. Drop list: []. R4 adapters were reused from the Modal volume; no R4 adapter was retrained.

Scaling table — gap_recovered mean ± std over 5 stratified seeds

N	mean	global_ridge	topk8_global_ridge
4	-0.058 ± 0.176	0.188 ± 0.012	0.188 ± 0.012
8	-0.140 ± 0.161	0.201 ± 0.012	0.201 ± 0.012
12	-0.062 ± 0.120	0.208 ± 0.008	0.208 ± 0.008
16	-0.010 ± 0.088	0.212 ± 0.006	0.212 ± 0.006
24	0.024 ± 0.000	0.218 ± 0.000	0.216 ± 0.000

Figure

Interpretation

Top-K beats global_ridge at small N=4 (0.188 vs 0.188) and flattens by N=8. Across N, the mean baseline ranges -0.140–0.024, while the best learned curve reaches 0.218 gap recovered.