--- base_model: shinich001/qwen3-4b-lr5e5-ep1-seq2k datasets: - u-10bei/dpo-dataset-qwen-cot language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - dpo - unsloth - qwen - alignment --- # qwen3-4b-seq2k-dpo-merged This model is a fine-tuned version of **shinich001/qwen3-4b-lr5e5-ep1-seq2k** using **Direct Preference Optimization (DPO)** via the **Unsloth** library. This repository contains the **full-merged 16-bit weights**. No adapter loading is required. ## Training Configuration - **Base model**: shinich001/qwen3-4b-lr5e5-ep1-seq2k - **Method**: DPO - **Epochs**: 1 - **Learning rate**: 5e-06 - **Beta**: 0.1 - **Max sequence length**: 2048