# Cross-Model LoRA Adapter Translation — Round 3 (3B Domain Expansion)

**Repo:** https://huggingface.co/Samarth0710/cross-model-lora-prediction-3b
**Models:** X=`Qwen/Qwen2.5-3B-Instruct` → Y=`meta-llama/Llama-3.2-3B-Instruct`
**LoRA:** r=8, alpha=16, dropout=0, target=`q_proj,v_proj`, 1 epoch SFT, bs=8, lr=2e-4, bf16, max_seq_len=512.

## Experiment 1 — Main table

| Domain | Task | base_Y | mean | global_ridge | pertensor_ridge | topk8_global_ridge | topk8_pertensor_ridge | pertensor_mlp | oracle | gap_recovered |
|---|---|---|---|---|---|---|---|---|---|---|
| math | gsm_hard | 0.057 | 0.063 | 0.053 | 0.057 | 0.050 | 0.047 | 0.067 | 0.073 | 0.600 |
| math | math_algebra_medium | 0.093 | 0.100 | 0.093 | 0.100 | 0.103 | 0.103 | 0.093 | 0.097 | 3.000 |
| code | humaneval_plus | 0.079 | 0.085 | 0.067 | 0.067 | 0.067 | 0.067 | 0.073 | 0.067 | -0.500 |
| code | mbpp_plus | 0.217 | 0.207 | 0.217 | 0.210 | 0.213 | 0.203 | 0.200 | 0.220 | 0.000 |
| science | arc_challenge | 0.706 | 0.732 | 0.706 | 0.706 | 0.706 | 0.706 | 0.726 | 0.726 | 1.333 |
| science | mmlu_college_chemistry | 0.375 | 0.375 | 0.375 | 0.375 | 0.375 | 0.375 | 0.250 | 0.375 | NA |


## Success criteria

- Best learned method minus mean baseline (average over held-out): `-0.000`
- Domain average gap_recovered: `{'math': 1.800000000000002, 'code': -0.2500000000000003, 'science': 1.3333333333333333}`

## Experiment 2 — Anchor-count + Top-K scaling

![Anchor scaling](figures/exp2_anchor_scaling.png)

## Experiment 3 — Cross-domain transfer

![Transfer heatmap](figures/exp3_transfer_heatmap.png)

| Held-out domain | Best anchor pool | Top-K actual selections (top-3) |
|---|---|---|
| math | code-only | `{'gsm_hard': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'math_algebra_medium': ['humaneval', 'mbpp_sanitized', 'mbpp']}` |
| code | math-only | `{'humaneval_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy'], 'mbpp_plus': ['math_counting_easy', 'multiarith', 'math_algebra_easy']}` |
| science | code-only | `{'arc_challenge': ['humaneval', 'mbpp_sanitized', 'mbpp'], 'mmlu_college_chemistry': ['mbpp', 'mbpp_sanitized', 'humaneval']}` |

## Honest failure modes / notes

- Dataset-loading failures, if any, are listed in `dataset_audit_round3.json`; failed anchors were dropped as instructed, while preserving at least six anchors per domain when available.
- Code-task evaluation is string/span matching against reference code, not sandboxed unit-test execution; numbers should be interpreted as a cheap adapter-locality proxy rather than pass@1.
- If an oracle adapter does not improve over base Y, the corresponding gap_recovered is unstable/meaningless and should be treated as diagnostic rather than evidence of mapping quality.
- Math exact-match uses numeric extraction from generated text; formatting failures are counted as wrong.