Cross-Model LoRA Adapter Translation — Round 3 (3B Domain Expansion)

Repo: https://huggingface.co/Samarth0710/cross-model-lora-prediction-3b Models: X=Qwen/Qwen2.5-3B-Instruct → Y=meta-llama/Llama-3.2-3B-Instruct LoRA: r=8, alpha=16, dropout=0, target=q_proj,v_proj, 1 epoch SFT, bs=8, lr=2e-4, bf16, max_seq_len=512.

Experiment 1 — Main table

Domain	Task	base_Y	mean	global_ridge	pertensor_ridge	topk8_global_ridge	topk8_pertensor_ridge	pertensor_mlp	oracle	gap_recovered
math	gsm_hard	0.057	0.063	0.053	0.057	0.050	0.047	0.067	0.073	0.600
math	math_algebra_medium	0.093	0.100	0.093	0.100	0.103	0.103	0.093	0.097	3.000
code	humaneval_plus	0.079	0.085	0.067	0.067	0.067	0.067	0.073	0.067	-0.500
code	mbpp_plus	0.217	0.207	0.217	0.210	0.213	0.203	0.200	0.220	0.000
science	arc_challenge	0.706	0.732	0.706	0.706	0.706	0.706	0.726	0.726	1.333
science	mmlu_college_chemistry	0.375	0.375	0.375	0.375	0.375	0.375	0.250	0.375	NA

Success criteria

Best learned method minus mean baseline (average over held-out): -0.000
Domain average gap_recovered: {'math': 1.800000000000002, 'code': -0.2500000000000003, 'science': 1.3333333333333333}

Experiment 2 — Anchor-count + Top-K scaling

Experiment 3 — Cross-domain transfer

Held-out domain	Best anchor pool	Top-K actual selections (top-3)
math	code-only	`{'gsm_hard': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'math_algebra_medium': ['humaneval', 'mbpp_sanitized', 'mbpp']}`
code	math-only	`{'humaneval_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy'], 'mbpp_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy']}`
science	code-only	`{'arc_challenge': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'mmlu_college_chemistry': ['mbpp', 'mbpp_sanitized', 'humaneval']}`

Honest failure modes / notes

Dataset-loading failures, if any, are listed in dataset_audit_round3.json; failed anchors were dropped as instructed, while preserving at least six anchors per domain when available.
Code-task evaluation is string/span matching against reference code, not sandboxed unit-test execution; numbers should be interpreted as a cheap adapter-locality proxy rather than pass@1.
If an oracle adapter does not improve over base Y, the corresponding gap_recovered is unstable/meaningless and should be treated as diagnostic rather than evidence of mapping quality.
Math exact-match uses numeric extraction from generated text; formatting failures are counted as wrong.