Text Generation
PEFT
Safetensors
English
qlora
lora
fine-tuning
reasoning
qwen2.5
openthoughts
4-bit precision
nf4
conversational
Instructions to use rahmasaber/qwen2.5-iq-Finetuning-qlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rahmasaber/qwen2.5-iq-Finetuning-qlora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct") model = PeftModel.from_pretrained(base_model, "rahmasaber/qwen2.5-iq-Finetuning-qlora") - Notebooks
- Google Colab
- Kaggle
| library_name: peft | |
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-1.5B-Instruct | |
| tags: | |
| - qlora | |
| - lora | |
| - fine-tuning | |
| - reasoning | |
| - qwen2.5 | |
| - openthoughts | |
| - 4-bit | |
| - nf4 | |
| datasets: | |
| - open-thoughts/OpenThoughts-114k | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| model-index: | |
| - name: qwen2.5-iq-Finetuning-qlora | |
| results: [] | |
| # Qwen2.5-1.5B-Instruct β QLoRA Fine-Tuned on OpenThoughts-114k | |
| A QLoRA adapter for [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct), fine-tuned on curated reasoning traces from [OpenThoughts-114k](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) to produce clean, structured, step-by-step solutions. | |
| ## Key Details | |
| | | | | |
| |---|---| | |
| | **Base Model** | Qwen/Qwen2.5-1.5B-Instruct | | |
| | **Method** | QLoRA (4-bit NF4 + LoRA) | | |
| | **Dataset** | 30K samples from OpenThoughts-114k | | |
| | **Hardware** | Single NVIDIA T4 (16GB VRAM, free Colab) | | |
| | **Adapter Size** | ~50MB | | |
| | **Trainable Params** | ~1.5% of total model parameters | | |
| ## What This Adapter Does | |
| The base Qwen2.5-1.5B-Instruct model produces reasonable answers but tends to be verbose and sometimes loses structure in multi-step reasoning. This adapter improves: | |
| - **Response conciseness** β ~12% shorter outputs on average, cutting fluff while retaining substance | |
| - **Step-by-step structure** β cleaner formatting with numbered steps and proper LaTeX math notation | |
| - **Reasoning accuracy** β correct answers on trick questions and logic puzzles where the base model fumbles | |
| ## Training Details | |
| ### Quantization | |
| ``` | |
| BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.float16, | |
| bnb_4bit_use_double_quant=True, | |
| ) | |
| ``` | |
| ### LoRA Configuration | |
| ``` | |
| LoraConfig( | |
| r=32, | |
| lora_alpha=64, | |
| lora_dropout=0.05, | |
| target_modules=["q_proj", "k_proj", "v_proj", "o_proj", | |
| "gate_proj", "up_proj", "down_proj"], | |
| bias="none", | |
| task_type="CAUSAL_LM", | |
| ) | |
| ``` | |
| ### Training Hyperparameters | |
| | Parameter | Value | | |
| |---|---| | |
| | Epochs | 1 | | |
| | Batch size | 1 (Γ 4 gradient accumulation) | | |
| | Learning rate | 2e-4 | | |
| | Scheduler | Cosine with 50-step warmup | | |
| | Optimizer | Paged AdamW 8-bit | | |
| | Max sequence length | 2048 | | |
| | NEFTune noise alpha | 5 | | |
| | Precision | fp16 | | |
| ### Data Preprocessing β The Critical Step | |
| The OpenThoughts-114k dataset contains DeepSeek-R1 reasoning traces with two sections: | |
| - `<begin_of_thought>` β thousands of tokens of raw internal reasoning | |
| - `<begin_of_solution>` β the clean, structured final answer | |
| **We train only on the extracted solution block.** Training on the full traces causes the model to produce rambling, unfocused output. Extracting only the solution with a simple regex produced dramatically better results β same model, same hyperparameters, completely different output quality. | |
| ```python | |
| import re | |
| def formatting_func(example): | |
| role_map = {"human": "user", "gpt": "assistant"} | |
| messages = [] | |
| if example.get("system"): | |
| messages.append({"role": "system", "content": example["system"]}) | |
| for turn in example["conversations"]: | |
| role = role_map.get(turn["from"], turn["from"]) | |
| content = turn["value"] | |
| # Extract only the final solution | |
| if role == "assistant": | |
| match = re.search( | |
| r"<\|begin_of_solution\|>(.*?)<\|end_of_solution\|>", | |
| content, re.DOTALL, | |
| ) | |
| if match: | |
| content = match.group(1).strip() | |
| messages.append({"role": role, "content": content}) | |
| return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False) | |
| ``` | |
| ### Response Masking | |
| Labels are padded with `-100` on all non-assistant tokens using `DataCollatorForSeq2Seq`, so the cross-entropy loss is only computed on the tokens the model needs to generate at inference time. This improves sample efficiency β every gradient update is focused on useful generation. | |
| ## Usage | |
| ### Load with PEFT | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig | |
| from peft import PeftModel | |
| import torch | |
| # Load base model in 4-bit | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.float16, | |
| bnb_4bit_use_double_quant=True, | |
| ) | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| "Qwen/Qwen2.5-1.5B-Instruct", | |
| quantization_config=bnb_config, | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| # Load adapter | |
| model = PeftModel.from_pretrained(base_model, "rahmasaber/qwen2.5-iq-Finetuning-qlora") | |
| tokenizer = AutoTokenizer.from_pretrained("rahmasaber/qwen2.5-iq-Finetuning-qlora") | |
| model.eval() | |
| ``` | |
| ### Generate | |
| ```python | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful assistant that thinks step-by-step."}, | |
| {"role": "user", "content": "If 5 machines produce 5 widgets in 5 minutes, how many minutes for 100 machines to produce 100 widgets?"}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| output = model.generate( | |
| **inputs, | |
| max_new_tokens=512, | |
| temperature=0.7, | |
| top_p=0.9, | |
| do_sample=True, | |
| ) | |
| response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ### Compare Base vs Fine-Tuned | |
| ```python | |
| # Disable adapter β base model behavior | |
| model.disable_adapter_layers() | |
| base_response = generate(prompt) | |
| # Enable adapter β fine-tuned behavior | |
| model.enable_adapter_layers() | |
| ft_response = generate(prompt) | |
| ``` | |
| ## Evaluation | |
| Tested on 10 handcrafted reasoning prompts across 5 categories: | |
| | Category | # Prompts | What it tests | | |
| |---|---|---| | |
| | Logic Puzzles | 2 | Trick questions, careful reading | | |
| | Math | 3 | Word problems, sequential operations | | |
| | Reasoning | 2 | Formal logic, deductive puzzles | | |
| | Code | 1 | Algorithm complexity analysis | | |
| | Science | 2 | Physics principles, Archimedes | | |
| ### Results vs Base Model | |
| | Metric | Base | Fine-Tuned | | |
| |---|---|---| | |
| | Avg response length (tokens) | 314 | 275 (-12%) | | |
| | Correct on "all but 9 sheep" | β | β | | |
| | Correct on average speed (harmonic mean) | β | β | | |
| | Correct on discount stacking (32%) | β | β | | |
| | Correct on 5 machines/5 widgets | β | β | | |
| | Structured step-by-step format | Sometimes | Consistently | | |
| ### Held-Out Test Set | |
| 200 examples held out from the training sample for overfitting detection. Train/test loss gap remained healthy (< 0.5), confirming the model generalizes rather than memorizing. | |
| ## Limitations | |
| - **Small base model** β 1.5B parameters limits complex multi-hop reasoning | |
| - **1 epoch on 1.2K-3K samples** β more data and epochs would improve accuracy | |
| - **Self-evaluation bias** β LLM-as-judge uses the same model family; use a stronger external model (GPT-4, Claude) for rigorous evaluation | |
| - **Science questions** β the fine-tuned model occasionally gets physics wrong (e.g., feather vs bowling ball on Moon) | |
| - **No benchmark scores** β not evaluated on GSM8K, MATH, or HumanEval yet | |
| ## Files | |
| ``` | |
| . | |
| βββ adapter_config.json # LoRA configuration | |
| βββ adapter_model.safetensors # LoRA weights (~50MB) | |
| βββ tokenizer_config.json # Tokenizer settings | |
| βββ tokenizer.json # Tokenizer vocabulary | |
| βββ special_tokens_map.json # Special token mappings | |
| βββ README.md # This file | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @misc{saber2026qwen25qlora, | |
| title={QLoRA Fine-Tuning Qwen2.5-1.5B-Instruct on OpenThoughts-114k}, | |
| author={Rahma Saber}, | |
| year={2026}, | |
| url={https://huggingface.co/rahmasaber/qwen2.5-iq-Finetuning-qlora} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - [Qwen Team](https://huggingface.co/Qwen) for the base model | |
| - [OpenThoughts](https://huggingface.co/datasets/open-thoughts/OpenThoughts-114k) for the reasoning dataset | |
| - [Hugging Face](https://huggingface.co/) for PEFT, TRL, and the Hub | |
| - [Google Colab](https://colab.research.google.com/) for free GPU access |