Cross-Model LoRA Adapter Translation — Round 4

Repo: https://huggingface.co/CK0607/cross-model-lora-prediction-3b Models: X=Qwen/Qwen2.5-3B-Instruct → Y=meta-llama/Llama-3.2-3B-Instruct

Diff vs Round 3

Kept Round 3 3B model pair and mapping algorithms unchanged.
Replaced broken held-outs: math_algebra_medium → gsm8k_test_500, humaneval_plus → mbpp_test_held, mmlu_college_chemistry → openbookqa_test.
Retrained only the bounded Round 4 pool: 16 matched X/Y anchors plus 6 X held-out conditioning adapters and 6 Y oracle adapters.
Stronger recipe: LoRA r=16, alpha=32, targets=['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], epochs=3.0, train_per_task=1500, lr=0.0002, bf16, max_len=512.
Recomputed Top-K cosine selection from the new r=16/full-target X adapter space.

Experiment 1 — Main table

Rows with oracle - base_Y < 3 pp are flagged as not usable for averages.

Domain	Task	base_Y	mean	global_ridge	pertensor_ridge	topk8_global_ridge	topk8_pertensor_ridge	pertensor_mlp	oracle	oracle_minus_base_pp	usable	gap_recovered
math	gsm_hard	0.063	0.057	0.060	0.067	0.067	0.063	0.073	0.150	8.667	True	0.115
math	gsm8k_test_500	0.080	0.093	0.100	0.100	0.093	0.097	0.100	0.293	21.333	True	0.094
code	mbpp_test_held	0.230	0.240	0.250	0.250	0.250	0.250	0.240	0.320	9.000	True	0.222
code	mbpp_plus	0.217	0.213	0.280	0.270	0.270	0.267	0.210	0.450	23.333	True	0.271
science	arc_challenge	0.716	0.732	0.736	0.729	0.736	0.729	0.739	0.722	0.669	False	5.000
science	openbookqa_test	0.710	0.760	0.747	0.743	0.713	0.717	0.753	0.983	27.333	True	0.183

Headline

Best learned method minus mean baseline, averaged over usable held-outs: 0.0187
Usable held-outs: ['gsm_hard', 'gsm8k_test_500', 'mbpp_test_held', 'mbpp_plus', 'openbookqa_test']
Excluded held-outs: ['arc_challenge']

Top-K selection log

Held-out	topk8_global_ridge	topk8_pertensor_ridge
gsm_hard	`['math_counting_easy', 'mbpp_sanitized', 'mmlu_high_school_physics', 'humaneval', 'multiarith', 'math_algebra_easy', 'mmlu_elementary_math', 'mmlu_high_school_biology']`	`['math_counting_easy', 'mbpp_sanitized', 'mmlu_high_school_physics', 'humaneval', 'multiarith', 'math_algebra_easy', 'mmlu_elementary_math', 'mmlu_high_school_biology']`
gsm8k_test_500	`['math_counting_easy', 'mbpp_sanitized', 'mmlu_high_school_physics', 'humaneval', 'multiarith', 'math_algebra_easy', 'mmlu_elementary_math', 'mmlu_high_school_biology']`	`['math_counting_easy', 'mbpp_sanitized', 'mmlu_high_school_physics', 'humaneval', 'multiarith', 'math_algebra_easy', 'mmlu_elementary_math', 'mmlu_high_school_biology']`
mbpp_test_held	`['mbpp_sanitized', 'math_counting_easy', 'humaneval', 'mmlu_high_school_physics', 'multiarith', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_algebra_easy']`	`['mbpp_sanitized', 'math_counting_easy', 'humaneval', 'mmlu_high_school_physics', 'multiarith', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_algebra_easy']`
mbpp_plus	`['mbpp_sanitized', 'humaneval', 'math_counting_easy', 'mmlu_high_school_physics', 'multiarith', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_algebra_easy']`	`['mbpp_sanitized', 'humaneval', 'math_counting_easy', 'mmlu_high_school_physics', 'multiarith', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_algebra_easy']`
arc_challenge	`['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_counting_easy', 'mbpp_sanitized', 'humaneval', 'multiarith', 'math_algebra_easy']`	`['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mmlu_elementary_math', 'math_counting_easy', 'mbpp_sanitized', 'humaneval', 'multiarith', 'math_algebra_easy']`
openbookqa_test	`['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mbpp_sanitized', 'math_counting_easy', 'mmlu_elementary_math', 'humaneval', 'multiarith', 'math_algebra_easy']`	`['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mbpp_sanitized', 'math_counting_easy', 'mmlu_elementary_math', 'humaneval', 'multiarith', 'math_algebra_easy']`

Experiment 2 — Anchor-count + Top-K scaling

Experiment 3 — Cross-domain transfer

Held-out domain	Best anchor pool	Top-K actual selections (top-3)
math	science-only	`{'gsm_hard': ['mmlu_high_school_physics', 'mmlu_elementary_math', 'mmlu_high_school_biology'], 'gsm8k_test_500': ['mmlu_high_school_physics', 'mmlu_elementary_math', 'mmlu_high_school_biology']}`
code	code-only	`{'mbpp_test_held': ['mbpp_sanitized', 'humaneval', 'mbpp'], 'mbpp_plus': ['mbpp_sanitized', 'humaneval', 'mbpp']}`
science	science-only	`{'arc_challenge': ['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mmlu_elementary_math'], 'openbookqa_test': ['mmlu_high_school_physics', 'mmlu_high_school_biology', 'mmlu_elementary_math']}`

Honest failure modes

Excluded from averages: arc_challenge has oracle-base = 0.67 pp.
Code-task evaluation remains cheap answer-string/span matching, not sandboxed unit tests; code numbers are adapter-locality proxies, not pass@1.
Math uses numeric extraction/equality; formatting or non-numeric generations are counted wrong.
Top-K and ridge methods are exactly the prior mapping family; no new mapping method was added.