--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - agent - alfworld - sft - merged --- # qwen3-4b-advanced-sft-v0-merged This repository provides a **merged model** fine-tuned from Qwen/Qwen3-4B-Instruct-2507. - Dataset: u-10bei/sft_alfworld_trajectory_dataset_v5 - Method: LoRA SFT (merged into base model) - Max sequence length: 2048 - Epochs: 2 - Learning rate: 2e-06 - LoRA: r=64, alpha=128 ## vLLM Compatibility - LoRA adapter has been merged - No tokenizer vocabulary modification - Intended for AgentBench Advanced evaluation ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "deepkick/qwen3-4b-advanced-sft-v0-merged" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto" ) ``` ## Data / License Notes - Dataset: u-10bei/sft_alfworld_trajectory_dataset_v5 Please refer to the dataset page for license and terms. - Base model: Qwen/Qwen3-4B-Instruct-2507 Please comply with the base model's original terms of use.