Qwen2.5-7B-Instruct LoRA distill on LIMO oracle traces (8 epochs)

LoRA adapter that distills qwen3-235b-a22b oracle reasoning traces (clean inference, no attack) on LIMO questions into Qwen2.5-7B-Instruct.

Recipe

field	value
student	`Qwen/Qwen2.5-7B-Instruct`
teacher	`qwen3-235b-a22b` (clean inference via OpenRouter, no V3 attack, no ICL)
dataset	`Chia-Mu-Lab/limo-qwen3-235b-oracle-traces` (817 rows)
finetune	LoRA (r=8, alpha=16, target_modules=all-linear)
cutoff_len	16384
lr	1e-5, cosine, warmup_ratio=0.1
epochs	8
eff. batch	12 (1 × grad_accum 3 × 4 × H200)
save_steps	every 17 (1 ckpt / 0.36 epoch)
steps/epoch	47.0

Per-checkpoint MATH500

Evaluated via SGLang with max_new_tokens=12288.

checkpoint	epoch	MATH500
0	0.00	73.20% (base Qwen2.5-7B-Instruct)
17	0.36	73.00%
34	0.72	73.40%
51	1.09	72.80%
68	1.45	74.00% PEAK (+0.80pp)
85	1.81	72.20%
102	2.17	72.60%
119	2.53	73.20%
136	2.89	72.80%
153	3.26	70.60%
170	3.62	72.40%
187	3.98	72.20%
204	4.34	73.00%
221	4.70	73.60%
238	5.06	72.00%
255	5.43	72.20%
272	5.79	72.40%
289	6.15	72.00%
306	6.51	72.80%
323	6.87	72.40%
340	7.23	72.60%
357	7.60	71.60%
374	7.96	72.20%
376	8.00	71.00% (final)

Overfitting note

Peak accuracy is at checkpoint-68 (epoch ≈ 1.45), and the final 8-epoch checkpoint UNDERPERFORMS the peak by 3.0pp. Downstream users should grab checkpoint-68, not the final checkpoint.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype="auto", device_map="auto")
tok  = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")

# Peak checkpoint (recommended)
model = PeftModel.from_pretrained(base, "Chia-Mu-Lab/qwen2.5-7b-limo-qwen3-235b-oracle-lora-8ep", subfolder="checkpoint-68")

Note: `checkpoint-0` not included

The repo does not include checkpoint-0 — it is the bare Qwen2.5-7B-Instruct base model with no adapter weights. Use the base model directly if you want the 73.20% baseline.

Citation

LIMO question pool from the LIMO paper:

@article{ye2025limo,
  title  = {LIMO: Less is More for Reasoning},
  author = {Ye, Yixin and Huang, Zhen and Xiao, Yang and others},
  journal= {arXiv:2502.03387},
  year   = {2025}
}

The reasoning traces in this distill are fresh from qwen3-235b-a22b (not the LIMO published traces).

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Chia-Mu-Lab/qwen2.5-7b-limo-qwen3-235b-oracle-lora-8ep

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2183)

this model

Dataset used to train Chia-Mu-Lab/qwen2.5-7b-limo-qwen3-235b-oracle-lora-8ep

Paper for Chia-Mu-Lab/qwen2.5-7b-limo-qwen3-235b-oracle-lora-8ep

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 63