--- language: - ar license: apache-2.0 base_model: - unsloth/Qwen3.5-4B tags: - unsloth - qwen3_5 - trl - lora - sft - arabic - saudi-dialect - conversational - transformers datasets: - HeshamHaroon/saudi-dialect-conversations library_name: transformers --- # Qwen3.5-4B Saudi Dialect This model is a Saudi dialect conversational fine-tune of `unsloth/Qwen3.5-4B`, trained from the notebook `qwen3-5-4b-saudi-dialect-sft-modal.ipynb` and pushed to Hugging Face as a merged standalone model: - Model: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect - LoRA adapters: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect-lora - Dataset: https://huggingface.co/datasets/HeshamHaroon/saudi-dialect-conversations - Base model: https://huggingface.co/unsloth/Qwen3.5-4B The training setup uses Unsloth + TRL `SFTTrainer` with LoRA adapters and then merges the adapters back into the base model for easier deployment. ## Model Details - Base model: `unsloth/Qwen3.5-4B` - Fine-tuning method: LoRA SFT - Language: Arabic, focused on Saudi dialect conversations - Training modality in this run: text-only conversational SFT - Dataset split: `3545` total examples -> `3366` train / `179` eval - System prompt used in training: `أنت مساعد مفيد يتحدث باللهجة السعودية العامية.` - Tracking: Weights & Biases - W&B run: https://wandb.ai/cherguelainea/qwen-saudi-dialect/runs/6udmlaan ## Training Arguments | Argument | Value | |---|---:| | `max_seq_length` | `4096` | | `load_in_4bit` | `False` | | `load_in_8bit` | `False` | | `lora_r` | `16` | | `lora_alpha` | `16` | | `lora_dropout` | `0` | | `target_modules` | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | | `use_gradient_checkpointing` | `"unsloth"` | | `per_device_train_batch_size` | `16` | | `per_device_eval_batch_size` | `16` | | `gradient_accumulation_steps` | `4` | | Effective global batch size | `64` | | `warmup_steps` | `5` | | `num_train_epochs` | `4` | | `learning_rate` | `4e-4` | | `lr_scheduler_type` | `linear` | | `optim` | `adamw_8bit` | | `weight_decay` | `0.01` | | `dataset_text_field` | `messages` | | `packing` | `True` in config, but Unsloth reported `Sample packing skipped (vision-language model detected)` | | `remove_unused_columns` | `False` | | `save_strategy` | `steps` | | `save_steps` | `100` | | `eval_strategy` | `steps` | | `eval_steps` | `50` | | `seed` | `3407` | | `report_to` | `wandb` | | Precision used in this run | `bf16` | ## Training Results ### Loss and Metrics | Metric | Value | |---|---:| | `eval/loss` | `1.49976` | | `train/loss` (final W&B summary) | `1.18529` | | `training_loss` (`trainer_stats`) | `1.4871071903210766` | | `train_runtime_seconds` | `2490.3044 s` | | `train_runtime_minutes` | `41.51 min` | | `train_samples_per_second` | `5.407` | | `train_steps_per_second` | `0.085` | | `eval/runtime` | `9.6061 s` | | `eval/samples_per_second` | `18.53` | | `eval/steps_per_second` | `1.249` | | `train/global_step` | `212` | | `train/epoch` | `4` | | `train/grad_norm` | `0.69472` | | `total_flos` | `7.760619536796672e+16` | ### Trainable Parameters | Item | Value | |---|---:| | Total parameters | `4,560,499,200` | | Trainable LoRA parameters | `21,233,664` | | Trainable ratio | `0.4656%` | ## Hardware | Item | Value | |---|---:| | GPU | `NVIDIA A100-SXM4-40GB` | | Number of GPUs | `1` | | CUDA toolkit | `12.9` | | Torch | `2.8.0+cu129` | | Transformers | `5.3.0` | | Unsloth | `2026.3.6` | | GPU total memory | `39.494 GB` | | GPU memory reserved before training | `8.547 GB` | | Peak reserved GPU memory | `38.455 GB` | | Peak reserved GPU memory for LoRA training | `29.908 GB` | | Peak GPU memory usage | `97.37%` of available GPU memory | | System RAM | Not logged in the notebook outputs | Recorded memory numbers above are GPU memory / VRAM measurements taken from the training run. The notebook did not record host system RAM. ## Data Preparation The dataset examples are conversation turns stored under `messages`. During preprocessing, a Saudi Arabic system prompt is prepended to each conversation before fine-tuning. The training notebook keeps only valid conversations and then performs a `5%` evaluation split with seed `3407`. ## Usage ### Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect" tokenizer = AutoTokenizer.from_pretrained(repo_id) model = AutoModelForCausalLM.from_pretrained( repo_id, torch_dtype="auto", device_map="auto", ) messages = [ {"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}, {"role": "user", "content": "كيف حالك اليوم؟"}, ] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, enable_thinking=False, return_tensors="pt", ).to(model.device) outputs = model.generate( input_ids, max_new_tokens=200, temperature=0.7, top_p=0.9, ) print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)) ``` ### Unsloth *Install* ```python %%capture import re, torch v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0) xformers = "xformers==" + { "2.10": "0.0.34", "2.9": "0.0.33.post1", "2.8": "0.0.32.post2", }.get(v, "0.0.34") !pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer wandb !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth !pip install -q "transformers>=5.0.0" !pip install -q --no-deps "trl>=0.15.0" ``` *Run* ```python from unsloth import FastLanguageModel repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect" max_seq_length = 4096 model, tokenizer = FastLanguageModel.from_pretrained( model_name=repo_id, max_seq_length=max_seq_length, load_in_4bit=False, # this repo was pushed as merged_16bit ) FastLanguageModel.for_inference(model) messages = [ { "role": "system", "content": [ {"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."} ], }, { "role": "user", "content": [ {"type": "text", "text": "كيف حالك اليوم؟"} ], }, ] input_ids = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, enable_thinking=False, return_tensors="pt", ).to(model.device) output_ids = model.generate( input_ids=input_ids, max_new_tokens=200, use_cache=True, temperature=0.7, top_p=0.9, ) response = tokenizer.decode( output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, ) print(response) ``` ## Notes - This repository contains the merged full model pushed with `save_method="merged_16bit"`. - A separate LoRA adapter repository is also available: `AyoubChLin/Qwen3.5-4B-saudi-dialect-lora`. - The base checkpoint is multimodal-capable, but this fine-tune was trained on text-only dialogue data. - The training data is conversational and dialect-specific, so outputs may reflect biases or stylistic patterns present in the source dataset. [](https://github.com/unslothai/unsloth)