--- license: apache-2.0 base_model: Qwen/Qwen2.5-7B-Instruct tags: - lora - peft - reasoning - math - distillation - qwen2.5 - limo library_name: peft datasets: - Chia-Mu-Lab/limo-qwen3-235b-oracle-traces --- # Qwen2.5-7B-Instruct LoRA distill on LIMO oracle traces (8 epochs) LoRA adapter that distills **qwen3-235b-a22b** oracle reasoning traces (clean inference, no attack) on **LIMO** questions into **Qwen2.5-7B-Instruct**. ## Recipe | field | value | |---|---| | student | `Qwen/Qwen2.5-7B-Instruct` | | teacher | `qwen3-235b-a22b` (clean inference via OpenRouter, no V3 attack, no ICL) | | dataset | [`Chia-Mu-Lab/limo-qwen3-235b-oracle-traces`](https://huggingface.co/datasets/Chia-Mu-Lab/limo-qwen3-235b-oracle-traces) (817 rows) | | finetune | LoRA (r=8, alpha=16, target_modules=all-linear) | | cutoff_len | 16384 | | lr | 1e-5, cosine, warmup_ratio=0.1 | | epochs | 8 | | eff. batch | 12 (1 × grad_accum 3 × 4 × H200) | | save_steps | every 17 (1 ckpt / 0.36 epoch) | | steps/epoch | 47.0 | ## Per-checkpoint MATH500 Evaluated via SGLang with `max_new_tokens=12288`. | checkpoint | epoch | MATH500 | |---:|---:|---:| | 0 | 0.00 | 73.20% (base Qwen2.5-7B-Instruct) | | 17 | 0.36 | 73.00% | | 34 | 0.72 | 73.40% | | 51 | 1.09 | 72.80% | | **68** | **1.45** | **74.00%** PEAK (+0.80pp) | | 85 | 1.81 | 72.20% | | 102 | 2.17 | 72.60% | | 119 | 2.53 | 73.20% | | 136 | 2.89 | 72.80% | | 153 | 3.26 | 70.60% | | 170 | 3.62 | 72.40% | | 187 | 3.98 | 72.20% | | 204 | 4.34 | 73.00% | | 221 | 4.70 | 73.60% | | 238 | 5.06 | 72.00% | | 255 | 5.43 | 72.20% | | 272 | 5.79 | 72.40% | | 289 | 6.15 | 72.00% | | 306 | 6.51 | 72.80% | | 323 | 6.87 | 72.40% | | 340 | 7.23 | 72.60% | | 357 | 7.60 | 71.60% | | 374 | 7.96 | 72.20% | | 376 | 8.00 | 71.00% (final) | ## Overfitting note **Peak accuracy is at `checkpoint-68` (epoch ≈ 1.45), and the final 8-epoch checkpoint UNDERPERFORMS the peak by 3.0pp.** Downstream users should grab `checkpoint-68`, not the final checkpoint. ## Usage ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype="auto", device_map="auto") tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") # Peak checkpoint (recommended) model = PeftModel.from_pretrained(base, "Chia-Mu-Lab/qwen2.5-7b-limo-qwen3-235b-oracle-lora-8ep", subfolder="checkpoint-68") ``` ## Note: `checkpoint-0` not included The repo does **not** include `checkpoint-0` — it is the bare `Qwen2.5-7B-Instruct` base model with no adapter weights. Use the base model directly if you want the 73.20% baseline. ## Citation LIMO question pool from the **LIMO** paper: ```bibtex @article{ye2025limo, title = {LIMO: Less is More for Reasoning}, author = {Ye, Yixin and Huang, Zhen and Xiao, Yang and others}, journal= {arXiv:2502.03387}, year = {2025} } ``` The reasoning traces in this distill are **fresh** from `qwen3-235b-a22b` (not the LIMO published traces).