--- base_model: M-Alkassem/qwen2.5-coder-3b-unsloth-lora library_name: peft pipeline_tag: text-generation license: apache-2.0 tags: - lora - peft - sft - transformers - trl - unsloth - code - agentic - coding-agent - software-engineering datasets: - ernie-research/MEnvData-SWE-Trajectory --- # qwen2.5-coder-3b-agent-v1 This repository contains a LoRA adapter, not a full standalone model. It is the second-stage adapter in the project and was created by continuing fine-tuning from: - `M-Alkassem/qwen2.5-coder-3b-unsloth-lora` The goal of this stage was to make the model more useful in a constrained tool-using workflow, especially for multi-step coding and debugging behavior. ## What This Model Is This adapter is the agent-oriented continued fine-tune in the project. Training goal: - improve multi-step software-engineering behavior - improve inspect → reason → edit → test style behavior - make the model more useful inside a lightweight coding-agent loop This adapter should be loaded on top of the Qwen2.5-Coder 3B base model. ## Important Context This adapter was not trained from scratch. The training path was: 1. base model: `unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit` 2. coding-focused adapter: `M-Alkassem/qwen2.5-coder-3b-unsloth-lora` 3. agent-oriented continued fine-tune: this repository That means this adapter represents the latest learned state after both fine-tuning stages. ## Dataset This adapter was trained on a sampled subset of: - `ernie-research/MEnvData-SWE-Trajectory` Project training setup: - sampled rows: `700` - formatting strategy: tail-capped trajectory formatting to fit the token budget - max sequence length: `1024` - training steps: `150` ## Training Summary This model was trained with supervised fine-tuning (SFT) using LoRA and 4-bit quantization. Key setup: - continued from the coding adapter - batch size per device: `1` - gradient accumulation: `16` - learning rate: `5e-5` - optimizer: `adamw_8bit` - hardware: Google Colab `Tesla T4` Observed result: - final training loss: about `1.2940` ## Intended Use Use this adapter when you want: - a model that is better suited for a constrained coding-agent workflow - more agent-style behavior in inspect/edit/test tasks - a reasoning core for a lightweight tool-using coding agent This adapter is most meaningful when paired with: - a controller loop - file tools - Python execution tools - iterative feedback from tool outputs ## Limitations This adapter is not a standalone merged model. It also did not perform best in the plain direct-answer benchmark used in the project. In that evaluation, the original base model remained strongest overall. So this adapter should not be presented as universally better at plain coding Q&A. Its value is more visible in tool-using and multi-step agent-style workflows. ## How To Load ```python import torch from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig BASE_MODEL = "Qwen/Qwen2.5-Coder-3B-Instruct" ADAPTER_MODEL = "M-Alkassem/qwen2.5-coder-3b-agent-v1" bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token base_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL, quantization_config=bnb_config, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL) model.eval() ``` Example Prompt ```prompt = "A stack implementation fails a unit test when pop() is called on an empty stack. Explain how you would debug this step by step and propose a fix."``` Project Context This adapter is part of a larger project with: a coding-focused fine-tune an agent-oriented continued fine-tune a direct-answer benchmark comparing base vs coding adapter vs agent adapter a constrained agent_v2 prototype with file and Python tools In the documented agent_v2 run, the model was able to: run failing tests detect a bug rewrite code rerun tests stop after success This is the main reason this adapter should be evaluated in both: direct-answer mode tool-using agent mode References - Coding adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-unsloth-lora - Base Qwen2.5-Coder model: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct - Unsloth quantized base: https://huggingface.co/unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit - Dataset card: https://huggingface.co/datasets/ernie-research/MEnvData-SWE-Trajectory ## Citation If you use this adapter, please cite the upstream Qwen2.5-Coder work and the dataset used for the agent-oriented continued fine-tune. ```bibtex @article{hui2024qwen2p5coder, title={Qwen2.5-Coder Technical Report}, author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jing and Liu, Dayiheng and Zhang, Liqun and Liu, Tianyang and Zhang, Jiawei and Yu, Bo and Lu, Kaican and others}, journal={arXiv preprint arXiv:2409.12186}, year={2024} }