File size: 7,428 Bytes
3d6436b
2928732
 
 
 
 
3d6436b
 
 
2928732
 
 
 
 
 
 
 
 
 
3d6436b
 
2928732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c518b1f
 
 
2928732
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d6436b
2928732
 
 
 
3d6436b
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
---
language:
- ar
license: apache-2.0
base_model:
- unsloth/Qwen3.5-4B
tags:
- unsloth
- qwen3_5
- trl
- lora
- sft
- arabic
- saudi-dialect
- conversational
- transformers
datasets:
- HeshamHaroon/saudi-dialect-conversations
library_name: transformers
---

# Qwen3.5-4B Saudi Dialect

This model is a Saudi dialect conversational fine-tune of `unsloth/Qwen3.5-4B`, trained from the notebook `qwen3-5-4b-saudi-dialect-sft-modal.ipynb` and pushed to Hugging Face as a merged standalone model:

- Model: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect
- LoRA adapters: https://huggingface.co/AyoubChLin/Qwen3.5-4B-saudi-dialect-lora
- Dataset: https://huggingface.co/datasets/HeshamHaroon/saudi-dialect-conversations
- Base model: https://huggingface.co/unsloth/Qwen3.5-4B

The training setup uses Unsloth + TRL `SFTTrainer` with LoRA adapters and then merges the adapters back into the base model for easier deployment.

## Model Details

- Base model: `unsloth/Qwen3.5-4B`
- Fine-tuning method: LoRA SFT
- Language: Arabic, focused on Saudi dialect conversations
- Training modality in this run: text-only conversational SFT
- Dataset split: `3545` total examples -> `3366` train / `179` eval
- System prompt used in training: `أنت مساعد مفيد يتحدث باللهجة السعودية العامية.`
- Tracking: Weights & Biases
- W&B run: https://wandb.ai/cherguelainea/qwen-saudi-dialect/runs/6udmlaan

## Training Arguments

| Argument | Value |
|---|---:|
| `max_seq_length` | `4096` |
| `load_in_4bit` | `False` |
| `load_in_8bit` | `False` |
| `lora_r` | `16` |
| `lora_alpha` | `16` |
| `lora_dropout` | `0` |
| `target_modules` | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| `use_gradient_checkpointing` | `"unsloth"` |
| `per_device_train_batch_size` | `16` |
| `per_device_eval_batch_size` | `16` |
| `gradient_accumulation_steps` | `4` |
| Effective global batch size | `64` |
| `warmup_steps` | `5` |
| `num_train_epochs` | `4` |
| `learning_rate` | `4e-4` |
| `lr_scheduler_type` | `linear` |
| `optim` | `adamw_8bit` |
| `weight_decay` | `0.01` |
| `dataset_text_field` | `messages` |
| `packing` | `True` in config, but Unsloth reported `Sample packing skipped (vision-language model detected)` |
| `remove_unused_columns` | `False` |
| `save_strategy` | `steps` |
| `save_steps` | `100` |
| `eval_strategy` | `steps` |
| `eval_steps` | `50` |
| `seed` | `3407` |
| `report_to` | `wandb` |
| Precision used in this run | `bf16` |

## Training Results

### Loss and Metrics

| Metric | Value |
|---|---:|
| `eval/loss` | `1.49976` |
| `train/loss` (final W&B summary) | `1.18529` |
| `training_loss` (`trainer_stats`) | `1.4871071903210766` |
| `train_runtime_seconds` | `2490.3044 s` |
| `train_runtime_minutes` | `41.51 min` |
| `train_samples_per_second` | `5.407` |
| `train_steps_per_second` | `0.085` |
| `eval/runtime` | `9.6061 s` |
| `eval/samples_per_second` | `18.53` |
| `eval/steps_per_second` | `1.249` |
| `train/global_step` | `212` |
| `train/epoch` | `4` |
| `train/grad_norm` | `0.69472` |
| `total_flos` | `7.760619536796672e+16` |

### Trainable Parameters

| Item | Value |
|---|---:|
| Total parameters | `4,560,499,200` |
| Trainable LoRA parameters | `21,233,664` |
| Trainable ratio | `0.4656%` |

## Hardware

| Item | Value |
|---|---:|
| GPU | `NVIDIA A100-SXM4-40GB` |
| Number of GPUs | `1` |
| CUDA toolkit | `12.9` |
| Torch | `2.8.0+cu129` |
| Transformers | `5.3.0` |
| Unsloth | `2026.3.6` |
| GPU total memory | `39.494 GB` |
| GPU memory reserved before training | `8.547 GB` |
| Peak reserved GPU memory | `38.455 GB` |
| Peak reserved GPU memory for LoRA training | `29.908 GB` |
| Peak GPU memory usage | `97.37%` of available GPU memory |
| System RAM | Not logged in the notebook outputs |

Recorded memory numbers above are GPU memory / VRAM measurements taken from the training run. The notebook did not record host system RAM.

## Data Preparation

The dataset examples are conversation turns stored under `messages`. During preprocessing, a Saudi Arabic system prompt is prepended to each conversation before fine-tuning. The training notebook keeps only valid conversations and then performs a `5%` evaluation split with seed `3407`.

## Usage

### Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."},
    {"role": "user", "content": "كيف حالك اليوم؟"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=200,
    temperature=0.7,
    top_p=0.9,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
```

### Unsloth

*Install* 

```python
%%capture
import re, torch

v = re.match(r"[\d]{1,}\.[\d]{1,}", str(torch.__version__)).group(0)
xformers = "xformers==" + {
    "2.10": "0.0.34",
    "2.9": "0.0.33.post1",
    "2.8": "0.0.32.post2",
}.get(v, "0.0.34")

!pip install sentencepiece protobuf "datasets>=2.18.0" "huggingface_hub>=0.34.0" hf_transfer wandb
!pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install -q "transformers>=5.0.0"
!pip install -q --no-deps "trl>=0.15.0"
```

*Run*

```python
from unsloth import FastLanguageModel

repo_id = "AyoubChLin/Qwen3.5-4B-saudi-dialect"
max_seq_length = 4096

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=repo_id,
    max_seq_length=max_seq_length,
    load_in_4bit=False,  # this repo was pushed as merged_16bit
)

FastLanguageModel.for_inference(model)

messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "أنت مساعد مفيد يتحدث باللهجة السعودية العامية."}
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "كيف حالك اليوم؟"}
        ],
    },
]

input_ids = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

output_ids = model.generate(
    input_ids=input_ids,
    max_new_tokens=200,
    use_cache=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(
    output_ids[0][input_ids.shape[-1]:],
    skip_special_tokens=True,
)
print(response)
```

## Notes

- This repository contains the merged full model pushed with `save_method="merged_16bit"`.
- A separate LoRA adapter repository is also available: `AyoubChLin/Qwen3.5-4B-saudi-dialect-lora`.
- The base checkpoint is multimodal-capable, but this fine-tune was trained on text-only dialogue data.
- The training data is conversational and dialect-specific, so outputs may reflect biases or stylistic patterns present in the source dataset.


[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)