qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1
This repository provides a merged full model fine-tuned for AgentBench tasks
(ALFWorld + DBBench).
Base model: Qwen/Qwen2.5-7B-Instruct
This repository contains fully merged model weights (LoRA merged into the base model).
Training Objective
This model is optimized for:
- Sequential trajectory planning (ALFWorld)
- Structured reasoning and database querying (DBBench)
- Deterministic action generation
- Reduced invalid action rate
Datasets Used
The model was trained using only officially provided training datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
Mixing strategy:
- ALFWorld (v5) and DBBench (v4) mixed in a 1:1 ratio.
- No validation or test splits were used for training.
Fine-tuning Method
- Supervised Fine-Tuning (SFT)
- LoRA-based training
- LoRA weights merged into base model before upload
- Loss applied only to assistant outputs
- No external datasets were used
Reproducibility
Base model: Qwen/Qwen2.5-7B-Instruct
Training framework:
- Hugging Face Transformers
- PEFT (LoRA)
Evaluation decoding configuration:
- do_sample=False
- temperature=0.0
- Deterministic generation
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "HamadaMayu/qwen2.5-7b-agent-trajectory-mixed_dbv4_alfv4_1to1"
tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", )
prompt = "Your task prompt here" inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate( **inputs, max_new_tokens=512, do_sample=False, temperature=0.0, )
print(tokenizer.decode(output[0], skip_special_tokens=True))
Intended Use
- AgentBench evaluation
- Research on trajectory learning
- Educational experiments
Limitations
- Performance may degrade outside AgentBench domains.
- Long-horizon planning is limited by context length.
- Invalid actions may still occur under distribution shift.
- Downloads last month
- 3