--- library_name: peft license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - qlora - lora - fine-tuning - reasoning - qwen2.5 - openthoughts - 4-bit - nf4 datasets: - open-thoughts/OpenThoughts-114k language: - en pipeline_tag: text-generation model-index: - name: qwen2.5-iq-Finetuning-qlora results: [] --- # Qwen2.5-1.5B-Instruct — QLoRA Fine-Tuned on OpenThoughts-114k A QLoRA adapter for [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), fine-tuned on curated reasoning traces from [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) to produce clean, structured, step-by-step solutions. ## Key Details | | | |---|---| | **Base Model** | Qwen/Qwen2.5-1.5B-Instruct | | **Method** | QLoRA (4-bit NF4 + LoRA) | | **Dataset** | 30K samples from OpenThoughts-114k | | **Hardware** | Single NVIDIA T4 (16GB VRAM, free Colab) | | **Adapter Size** | ~50MB | | **Trainable Params** | ~1.5% of total model parameters | ## What This Adapter Does The base Qwen2.5-1.5B-Instruct model produces reasonable answers but tends to be verbose and sometimes loses structure in multi-step reasoning. This adapter improves: - **Response conciseness** — ~12% shorter outputs on average, cutting fluff while retaining substance - **Step-by-step structure** — cleaner formatting with numbered steps and proper LaTeX math notation - **Reasoning accuracy** — correct answers on trick questions and logic puzzles where the base model fumbles ## Training Details ### Quantization ``` BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) ``` ### LoRA Configuration ``` LoraConfig( r=32, lora_alpha=64, lora_dropout=0.05, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], bias="none", task_type="CAUSAL_LM", ) ``` ### Training Hyperparameters | Parameter | Value | |---|---| | Epochs | 1 | | Batch size | 1 (× 4 gradient accumulation) | | Learning rate | 2e-4 | | Scheduler | Cosine with 50-step warmup | | Optimizer | Paged AdamW 8-bit | | Max sequence length | 2048 | | NEFTune noise alpha | 5 | | Precision | fp16 | ### Data Preprocessing — The Critical Step The OpenThoughts-114k dataset contains DeepSeek-R1 reasoning traces with two sections: - `` — thousands of tokens of raw internal reasoning - `` — the clean, structured final answer **We train only on the extracted solution block.** Training on the full traces causes the model to produce rambling, unfocused output. Extracting only the solution with a simple regex produced dramatically better results — same model, same hyperparameters, completely different output quality. ```python import re def formatting_func(example): role_map = {"human": "user", "gpt": "assistant"} messages = [] if example.get("system"): messages.append({"role": "system", "content": example["system"]}) for turn in example["conversations"]: role = role_map.get(turn["from"], turn["from"]) content = turn["value"] # Extract only the final solution if role == "assistant": match = re.search( r"<\|begin_of_solution\|>(.*?)<\|end_of_solution\|>", content, re.DOTALL, ) if match: content = match.group(1).strip() messages.append({"role": role, "content": content}) return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) ``` ### Response Masking Labels are padded with `-100` on all non-assistant tokens using `DataCollatorForSeq2Seq`, so the cross-entropy loss is only computed on the tokens the model needs to generate at inference time. This improves sample efficiency — every gradient update is focused on useful generation. ## Usage ### Load with PEFT ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel import torch # Load base model in 4-bit bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-1.5B-Instruct", quantization_config=bnb_config, device_map="auto", trust_remote_code=True, ) # Load adapter model = PeftModel.from_pretrained(base_model, "rahmasaber/qwen2.5-iq-Finetuning-qlora") tokenizer = AutoTokenizer.from_pretrained("rahmasaber/qwen2.5-iq-Finetuning-qlora") model.eval() ``` ### Generate ```python messages = [ {"role": "system", "content": "You are a helpful assistant that thinks step-by-step."}, {"role": "user", "content": "If 5 machines produce 5 widgets in 5 minutes, how many minutes for 100 machines to produce 100 widgets?"}, ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True, ) response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) ``` ### Compare Base vs Fine-Tuned ```python # Disable adapter → base model behavior model.disable_adapter_layers() base_response = generate(prompt) # Enable adapter → fine-tuned behavior model.enable_adapter_layers() ft_response = generate(prompt) ``` ## Evaluation Tested on 10 handcrafted reasoning prompts across 5 categories: | Category | # Prompts | What it tests | |---|---|---| | Logic Puzzles | 2 | Trick questions, careful reading | | Math | 3 | Word problems, sequential operations | | Reasoning | 2 | Formal logic, deductive puzzles | | Code | 1 | Algorithm complexity analysis | | Science | 2 | Physics principles, Archimedes | ### Results vs Base Model | Metric | Base | Fine-Tuned | |---|---|---| | Avg response length (tokens) | 314 | 275 (-12%) | | Correct on "all but 9 sheep" | ✅ | ✅ | | Correct on average speed (harmonic mean) | ✅ | ✅ | | Correct on discount stacking (32%) | ✅ | ✅ | | Correct on 5 machines/5 widgets | ❌ | ✅ | | Structured step-by-step format | Sometimes | Consistently | ### Held-Out Test Set 200 examples held out from the training sample for overfitting detection. Train/test loss gap remained healthy (< 0.5), confirming the model generalizes rather than memorizing. ## Limitations - **Small base model** — 1.5B parameters limits complex multi-hop reasoning - **1 epoch on 1.2K-3K samples** — more data and epochs would improve accuracy - **Self-evaluation bias** — LLM-as-judge uses the same model family; use a stronger external model (GPT-4, Claude) for rigorous evaluation - **Science questions** — the fine-tuned model occasionally gets physics wrong (e.g., feather vs bowling ball on Moon) - **No benchmark scores** — not evaluated on GSM8K, MATH, or HumanEval yet ## Files ``` . ├── adapter_config.json # LoRA configuration ├── adapter_model.safetensors # LoRA weights (~50MB) ├── tokenizer_config.json # Tokenizer settings ├── tokenizer.json # Tokenizer vocabulary ├── special_tokens_map.json # Special token mappings └── README.md # This file ``` ## Citation ```bibtex @misc{saber2026qwen25qlora, title={QLoRA Fine-Tuning Qwen2.5-1.5B-Instruct on OpenThoughts-114k}, author={Rahma Saber}, year={2026}, url={https://huggingface.co/rahmasaber/qwen2.5-iq-Finetuning-qlora} } ``` ## Acknowledgments - [Qwen Team](https://huggingface.co/Qwen) for the base model - [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) for the reasoning dataset - [Hugging Face](https://huggingface.co/) for PEFT, TRL, and the Hub - [Google Colab](https://colab.research.google.com/) for free GPU access