Instructions to use Chia-Mu-Lab/qwen2.5-7b-s1k-qwen3-235b-oracle-lora-8ep with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Chia-Mu-Lab/qwen2.5-7b-s1k-qwen3-235b-oracle-lora-8ep with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Qwen2.5-7B-Instruct LoRA distill on s1K oracle traces (8 epochs)
LoRA adapter that distills qwen3-235b-a22b oracle reasoning traces (clean inference, no attack) on s1K-1.1 questions into Qwen2.5-7B-Instruct.
Recipe
| field | value |
|---|---|
| student | Qwen/Qwen2.5-7B-Instruct |
| teacher | qwen3-235b-a22b (clean inference via OpenRouter, no V3 attack, no ICL) |
| dataset | Chia-Mu-Lab/s1k-qwen3-235b-oracle-traces (963 rows) |
| finetune | LoRA (r=8, alpha=16, target_modules=all-linear) |
| cutoff_len | 16384 |
| lr | 1e-5, cosine, warmup_ratio=0.1 |
| epochs | 8 |
| eff. batch | 12 (1 × grad_accum 3 × 4 × H200) |
| save_steps | every 21 (1 ckpt / 0.33 epoch) |
| steps/epoch | 63.9 (1000 Qs packed to 759 rows / batch 12) |
Per-checkpoint MATH500
Evaluated via SGLang with max_new_tokens=12288.
| checkpoint | epoch | MATH500 |
|---|---|---|
| 0 | 0.00 | 73.20% (base Qwen2.5-7B-Instruct) |
| 21 | 0.33 | 73.40% |
| 42 | 0.66 | 72.40% |
| 63 | 0.99 | 75.00% PEAK (+1.80pp) |
| 84 | 1.31 | 73.80% |
| 105 | 1.64 | 72.40% |
| 126 | 1.97 | 71.00% |
| 147 | 2.30 | 72.00% |
| 168 | 2.63 | 72.20% |
| 189 | 2.96 | 72.40% |
| 210 | 3.29 | 71.80% |
| 231 | 3.62 | 73.00% |
| 252 | 3.94 | 71.20% |
| 273 | 4.27 | 72.40% |
| 294 | 4.60 | 71.20% |
| 315 | 4.93 | 72.40% |
| 336 | 5.26 | 71.20% |
| 357 | 5.59 | 70.60% |
| 378 | 5.92 | 71.60% |
| 399 | 6.24 | 72.80% |
| 420 | 6.57 | 71.40% |
| 441 | 6.90 | 71.00% |
| 462 | 7.23 | 72.80% |
| 483 | 7.56 | 73.00% |
| 504 | 7.89 | 70.00% |
| 512 | 8.00 | 70.40% (final) |
Overfitting note
Peak accuracy is at checkpoint-63 (epoch ≈ 1.0), and the final 8-epoch checkpoint UNDERPERFORMS the peak by 4.6pp. Downstream users should grab checkpoint-63, not the final checkpoint. This is consistent with the s1 paper observation that 1 epoch on s1K is sufficient.
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
# Peak checkpoint (recommended)
model = PeftModel.from_pretrained(base, "Chia-Mu-Lab/qwen2.5-7b-s1k-qwen3-235b-oracle-lora-8ep", subfolder="checkpoint-63")
Note: checkpoint-0 not included
The repo does not include checkpoint-0 — it is the bare Qwen2.5-7B-Instruct base model with no adapter weights. Use the base model directly if you want the 73.20% baseline.
Citation
s1K-1.1 question pool from the s1 paper:
@article{muennighoff2025s1,
title = {s1: Simple test-time scaling},
author = {Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and others},
journal= {arXiv:2501.19393},
year = {2025}
}
The reasoning traces in this distill are fresh from qwen3-235b-a22b (not the s1K-1.1 published traces).
- Downloads last month
- -