---
base_model: M-Alkassem/qwen2.5-coder-3b-unsloth-lora
library_name: peft
pipeline_tag: text-generation
license: apache-2.0
tags:
- lora
- peft
- sft
- transformers
- trl
- unsloth
- code
- agentic
- coding-agent
- software-engineering
datasets:
- ernie-research/MEnvData-SWE-Trajectory
---

# qwen2.5-coder-3b-agent-v1

This repository contains a LoRA adapter, not a full standalone model.

It is the second-stage adapter in the project and was created by continuing fine-tuning from:

- `M-Alkassem/qwen2.5-coder-3b-unsloth-lora`

The goal of this stage was to make the model more useful in a constrained tool-using workflow, especially for multi-step coding and debugging behavior.

## What This Model Is

This adapter is the agent-oriented continued fine-tune in the project.

Training goal:
- improve multi-step software-engineering behavior
- improve inspect → reason → edit → test style behavior
- make the model more useful inside a lightweight coding-agent loop

This adapter should be loaded on top of the Qwen2.5-Coder 3B base model.

## Important Context

This adapter was not trained from scratch.

The training path was:

1. base model: `unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit`
2. coding-focused adapter: `M-Alkassem/qwen2.5-coder-3b-unsloth-lora`
3. agent-oriented continued fine-tune: this repository

That means this adapter represents the latest learned state after both fine-tuning stages.

## Dataset

This adapter was trained on a sampled subset of:

- `ernie-research/MEnvData-SWE-Trajectory`

Project training setup:
- sampled rows: `700`
- formatting strategy: tail-capped trajectory formatting to fit the token budget
- max sequence length: `1024`
- training steps: `150`

## Training Summary

This model was trained with supervised fine-tuning (SFT) using LoRA and 4-bit quantization.

Key setup:
- continued from the coding adapter
- batch size per device: `1`
- gradient accumulation: `16`
- learning rate: `5e-5`
- optimizer: `adamw_8bit`
- hardware: Google Colab `Tesla T4`

Observed result:
- final training loss: about `1.2940`

## Intended Use

Use this adapter when you want:
- a model that is better suited for a constrained coding-agent workflow
- more agent-style behavior in inspect/edit/test tasks
- a reasoning core for a lightweight tool-using coding agent

This adapter is most meaningful when paired with:
- a controller loop
- file tools
- Python execution tools
- iterative feedback from tool outputs

## Limitations

This adapter is not a standalone merged model.

It also did not perform best in the plain direct-answer benchmark used in the project. In that evaluation, the original base model remained strongest overall.

So this adapter should not be presented as universally better at plain coding Q&A. Its value is more visible in tool-using and multi-step agent-style workflows.

## How To Load

```python
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

BASE_MODEL = "Qwen/Qwen2.5-Coder-3B-Instruct"
ADAPTER_MODEL = "M-Alkassem/qwen2.5-coder-3b-agent-v1"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, ADAPTER_MODEL)
model.eval()
```
Example Prompt
```prompt = "A stack implementation fails a unit test when pop() is called on an empty stack. Explain how you would debug this step by step and propose a fix."```


Project Context
This adapter is part of a larger project with:

a coding-focused fine-tune
an agent-oriented continued fine-tune
a direct-answer benchmark comparing base vs coding adapter vs agent adapter
a constrained agent_v2 prototype with file and Python tools
In the documented agent_v2 run, the model was able to:

run failing tests
detect a bug
rewrite code
rerun tests
stop after success
This is the main reason this adapter should be evaluated in both:

direct-answer mode
tool-using agent mode
References
- Coding adapter: https://huggingface.co/M-Alkassem/qwen2.5-coder-3b-unsloth-lora
- Base Qwen2.5-Coder model: https://huggingface.co/Qwen/Qwen2.5-Coder-3B-Instruct
- Unsloth quantized base: https://huggingface.co/unsloth/Qwen2.5-Coder-3B-Instruct-bnb-4bit
- Dataset card: https://huggingface.co/datasets/ernie-research/MEnvData-SWE-Trajectory

## Citation

If you use this adapter, please cite the upstream Qwen2.5-Coder work and the dataset used for the agent-oriented continued fine-tune.

```bibtex
@article{hui2024qwen2p5coder,
  title={Qwen2.5-Coder Technical Report},
  author={Hui, Binyuan and Yang, Jian and Cui, Zeyu and Yang, Jing and Liu, Dayiheng and Zhang, Liqun and Liu, Tianyang and Zhang, Jiawei and Yu, Bo and Lu, Kaican and others},
  journal={arXiv preprint arXiv:2409.12186},
  year={2024}
}