cross-model-lora-prediction-3b / ROUND5_REPORT.md
CK0607's picture
Round 5 upload ROUND5_REPORT.md
8632000 verified
|
raw
history blame
1.4 kB

Round 5 — 3B Anchor-Count Scaling Sweep

Repo: https://huggingface.co/CK0607/cross-model-lora-prediction-3b Models: X=Qwen/Qwen2.5-3B-Instruct → Y=meta-llama/Llama-3.2-3B-Instruct

Recipe + anchors

LoRA recipe: r=16, alpha=32, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], epochs=3.0, train=1500, bs=8, lr=0.0002, max_seq_len=512, bf16. New Round 5 anchors trained under round5/X and round5/Y: ['aqua_rat_numeric', 'math_counting_easy', 'mawps', 'mbpp_sanitized', 'humaneval', 'conala_curated', 'medmcqa_easy', 'pubmedqa_pqal']. Drop list: []. R4 adapters were reused from the Modal volume; no R4 adapter was retrained.

Scaling table — gap_recovered mean ± std over 5 stratified seeds

N mean global_ridge topk8_global_ridge
4 -0.058 ± 0.176 0.188 ± 0.012 0.188 ± 0.012
8 -0.140 ± 0.161 0.201 ± 0.012 0.201 ± 0.012
12 -0.062 ± 0.120 0.208 ± 0.008 0.208 ± 0.008
16 -0.010 ± 0.088 0.212 ± 0.006 0.212 ± 0.006
24 0.024 ± 0.000 0.218 ± 0.000 0.216 ± 0.000

Figure

3B anchor-count scaling

Interpretation

Top-K beats global_ridge at small N=4 (0.188 vs 0.188) and flattens by N=8. Across N, the mean baseline ranges -0.140–0.024, while the best learned curve reaches 0.218 gap recovered.