--- language: - en library_name: peft base_model: Qwen/Qwen2.5-3B-Instruct tags: - dialogue-systems - user-turn-prediction - qlora - conversational-ai - low-rank-adaptation - qwen - UserLM datasets: - allenai/WildChat-1M - GEM/schema_guided_dialog metrics: - bertscore - bleurt - perplexity pipeline_tag: text-generation --- # Qwen2.5-3B User Turn Prediction (QLoRA Fine-tuned) ## Model Description This model is a **QLoRA fine-tuned** version of [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) specifically trained for **user turn prediction** in multi-turn dialogues. Unlike traditional dialogue systems that predict assistant responses, this model predicts the next user utterance given conversation context. **Key Innovation**: Inverses the traditional dialogue modeling task by focusing on user behavior prediction rather than system response generation. ### Model Details - **Base Model**: Qwen2.5-3B-Instruct (3B parameters) - **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation) - **Quantization**: 4-bit NF4 with double quantization - **Training Examples**: 800 conversation pairs - **Evaluation Examples**: 40 conversation pairs - **Domains**: Open-domain (WildChat) + Task-oriented (Schema-Guided Dialogue) ## Visual Performance Analysis #### Relative Performance Improvements by Domain ![Performance Improvements](results/figures/figure5_bar_chart.png) _Figure 1: Relative performance changes across metrics and dialogue domains_ #### Baseline Configuration Comparison ![Baseline Comparison](results/figures/figure6_baseline_comparison.png) _Figure 2: Comparison of fine-tuned model against different baseline configurations_ ## Usage ### Installation ```bash pip install transformers peft torch accelerate bitsandbytes ``` ### Quick Start ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch from transformers import BitsAndBytesConfig bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.float16 ) # Load base model with 4-bit quantization model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2.5-3B-Instruct", device_map="auto", quantization_config=bnb_config, dtype=torch.float16, ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "path/to/qwen_userturn_lora") tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct") # Prepare conversation context conversation = [ {"role": "user", "content": "I'm looking for a restaurant in downtown"}, {"role": "assistant", "content": "What type of cuisine would you prefer?"} ] inputs = tokenizer.apply_chat_template( conversation, return_tensors="pt", tokenize=True, add_generation_prompt=False ).to(model.device) user_open_tokens = tokenizer.encode("<|im_start|>user\n", add_special_tokens=False, return_tensors="pt").to(model.device) # Directly concatenate the tensors input_ids = torch.cat([inputs, user_open_tokens], dim=-1) attention_mask = torch.ones_like(input_ids) input_len = int(input_ids.shape[1]) bad = tokenizer( ["<|im_start|>assistant", "<|im_start|>system", "<|im_start|>user"], add_special_tokens=False, return_tensors="pt" )["input_ids"].tolist() logits_processors = LogitsProcessorList([NoBadWordsLogitsProcessor(bad, eos_token_id=tokenizer.eos_token_id)]) # Generate prediction with torch.no_grad(): outputs = model.generate( input_ids, max_new_tokens=128, do_sample=True, temperature=0.4, top_p=0.9, attention_mask=attention_mask, logits_processor=logits_processors ) predicted_user_turn = tokenizer.decode( outputs[0][input_len:], skip_special_tokens=True ) print(f"Predicted user turn: {predicted_user_turn}") ``` ## Training Details ### Dataset **Training Set** (800 examples): - 400 from [WildChat-1M](https://huggingface.co/datasets/allenai/WildChat-1M) (open-domain) - 400 from [Schema-Guided Dialogue](https://huggingface.co/datasets/GEM/schema_guided_dialog) (task-oriented) **Evaluation Set** (40 examples): - 20 from WildChat-1M - 20 from Schema-Guided Dialogue **Selection Criteria**: - Minimum 2 turns per conversation - English language only (WildChat) - Valid assistant-user turn pairs ### Training Configuration ```python # QLoRA Configuration LoRA Rank: 16 LoRA Alpha: 32 LoRA Dropout: 0.01 Target Modules: [ "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" ] # Quantization Load in 4-bit: True BnB 4-bit Compute Dtype: float16 BnB 4-bit Quant Type: nf4 BnB 4-bit Use Double Quant: True ``` ## Evaluation Methodology ### Metrics 1. **BERTScore-F1**: Semantic similarity using contextualized embeddings 2. **BLEURT**: Learned metric trained on human judgments 3. **Perplexity**: Model confidence ## Intended Use ### Primary Use Cases ✅ Research on dialogue systems and user behavior modeling ✅ User simulation for dialogue system evaluation ✅ Conversational AI analysis and understanding ✅ Synthetic dialogue generation for training data augmentation ✅ User intent prediction in multi-turn contexts ### Out-of-Scope Use Cases ❌ Production deployment without safety guardrails ❌ Real-time user profiling or surveillance ❌ Generating harmful or manipulative content ❌ Non-English dialogue prediction (untested) ## Citation If you use this model in your research, please cite: ```bibtex @bachelorsthesis{sebastianboehler2025userturn, title={To what extent can open-source Large Language Models predict the next user turn in multi-turn dialogues across open-domain and task-oriented settings?}, author={Sebastian Boehler}, school={IU International University of Applied Sciences}, year={2025}, type={Bachelor's Thesis}, note={Model: qwen2.5-3b-dialogue-userturn-lora} } ``` ### Base Model Citation ```bibtex @article{qwen2.5, title={Qwen2.5: A Party of Foundation Models}, author={Qwen Team}, journal={arXiv preprint}, year={2024} } ``` ## Model Card Authors Sebastian Boehler - IU International University of Applied Sciences