# Cross-Model LoRA Adapter Translation — Round 3 (3B Domain Expansion) **Repo:** https://huggingface.co/Samarth0710/cross-model-lora-prediction-3b **Models:** X=`Qwen/Qwen2.5-3B-Instruct` → Y=`meta-llama/Llama-3.2-3B-Instruct` **LoRA:** r=8, alpha=16, dropout=0, target=`q_proj,v_proj`, 1 epoch SFT, bs=8, lr=2e-4, bf16, max_seq_len=512. ## Experiment 1 — Main table | Domain | Task | base_Y | mean | global_ridge | pertensor_ridge | topk8_global_ridge | topk8_pertensor_ridge | pertensor_mlp | oracle | gap_recovered | |---|---|---|---|---|---|---|---|---|---|---| | math | gsm_hard | 0.057 | 0.063 | 0.053 | 0.057 | 0.050 | 0.047 | 0.067 | 0.073 | 0.600 | | math | math_algebra_medium | 0.093 | 0.100 | 0.093 | 0.100 | 0.103 | 0.103 | 0.093 | 0.097 | 3.000 | | code | humaneval_plus | 0.079 | 0.085 | 0.067 | 0.067 | 0.067 | 0.067 | 0.073 | 0.067 | -0.500 | | code | mbpp_plus | 0.217 | 0.207 | 0.217 | 0.210 | 0.213 | 0.203 | 0.200 | 0.220 | 0.000 | | science | arc_challenge | 0.706 | 0.732 | 0.706 | 0.706 | 0.706 | 0.706 | 0.726 | 0.726 | 1.333 | | science | mmlu_college_chemistry | 0.375 | 0.375 | 0.375 | 0.375 | 0.375 | 0.375 | 0.250 | 0.375 | NA | ## Success criteria - Best learned method minus mean baseline (average over held-out): `-0.000` - Domain average gap_recovered: `{'math': 1.800000000000002, 'code': -0.2500000000000003, 'science': 1.3333333333333333}` ## Experiment 2 — Anchor-count + Top-K scaling ![Anchor scaling](figures/exp2_anchor_scaling.png) ## Experiment 3 — Cross-domain transfer ![Transfer heatmap](figures/exp3_transfer_heatmap.png) | Held-out domain | Best anchor pool | Top-K actual selections (top-3) | |---|---|---| | math | code-only | `{'gsm_hard': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'math_algebra_medium': ['humaneval', 'mbpp_sanitized', 'mbpp']}` | | code | math-only | `{'humaneval_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy'], 'mbpp_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy']}` | | science | code-only | `{'arc_challenge': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'mmlu_college_chemistry': ['mbpp', 'mbpp_sanitized', 'humaneval']}` | ## Honest failure modes / notes - Dataset-loading failures, if any, are listed in `dataset_audit_round3.json`; failed anchors were dropped as instructed, while preserving at least six anchors per domain when available. - Code-task evaluation is string/span matching against reference code, not sandboxed unit-test execution; numbers should be interpreted as a cheap adapter-locality proxy rather than pass@1. - If an oracle adapter does not improve over base Y, the corresponding gap_recovered is unstable/meaningless and should be treated as diagnostic rather than evidence of mapping quality. - Math exact-match uses numeric extraction from generated text; formatting failures are counted as wrong.