Translation
Transformers
Safetensors
English
Arabic
marian
text2text-generation
File size: 4,837 Bytes
4323cb5
 
8a3be92
 
 
 
 
 
 
 
 
 
 
4323cb5
 
8a3be92
4323cb5
8a3be92
4323cb5
 
 
 
 
8a3be92
4323cb5
8a3be92
 
 
 
 
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
 
4323cb5
8a3be92
 
4323cb5
8a3be92
 
1a08b7b
8a3be92
4323cb5
8a3be92
 
4323cb5
8a3be92
 
 
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
 
 
 
 
 
4323cb5
 
8a3be92
4323cb5
 
 
 
 
8a3be92
4323cb5
8a3be92
 
 
 
4323cb5
 
 
 
 
8a3be92
 
 
 
 
 
 
4323cb5
8a3be92
4323cb5
 
 
8a3be92
4323cb5
8a3be92
 
 
4323cb5
 
 
8a3be92
 
 
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
 
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
 
8face69
8a3be92
4323cb5
 
 
8a3be92
 
 
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
4323cb5
8a3be92
 
 
 
8face69
8a3be92
 
 
 
4323cb5
8a3be92
4323cb5
 
 
8a3be92
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: transformers
license: apache-2.0
datasets:
- Omar-youssef/English-Egyptian-Arabic-Translation-Pairs
language:
- en
- ar
metrics:
- bleu
base_model:
- Helsinki-NLP/opus-mt-tc-big-en-ar
pipeline_tag: translation
---

# English → Egyptian Arabic Translation Model

A fine-tuned version of [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar) specialized for **Egyptian Arabic dialect** translation.

## Model Details

### Model Description

This model translates English text into **Egyptian Arabic (العامية المصرية)**, the most widely spoken Arabic dialect. It was fine-tuned on Egyptian Arabic translation pairs, making it more accurate for colloquial Egyptian Arabic compared to the base model which targets Modern Standard Arabic (MSA).

- **Model type:** Seq2Seq Translation (MarianMT)
- **Language(s):** English (`en`) → Egyptian Arabic (`ar`)
- **License:** Apache 2.0
- **Base model:** [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar)
- **Fine-tuned on:** [Omar-youssef/English-Egyptian-Arabic-Translation-Pairs](https://huggingface.co/datasets/Omar-youssef/English-Egyptian-Arabic-Translation-Pairs)

---

## How to Get Started

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")

def translate(text):
    inputs = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(inputs, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translate("A novel is a long prose narrative that usually describes fictional characters and events in the form of a sequential story."))
# Output: الرواية هي رواية نثرية طويلة عادةً بتصف شخصيات وأحداث خيالية في شكل قصة متتابعة.

print(translate("The rapid advancement of artificial intelligence is transforming many industries, creating new opportunities while also raising important ethical and social concerns."))
# Output: التقدم السريع للـ Artificial Intelligence بيغير صناعات كتير، وبيخلق فرص جديدة وفي نفس الوقت بيثير اهتمامات أخلاقية واجتماعية مهمة.
```

---

## Uses

### Direct Use
Translate English sentences or short paragraphs into Egyptian Arabic dialect. Suitable for:
- Chatbots targeting Egyptian Arabic speakers
- Language learning applications
- Content localization for Egyptian audiences
- Research on Arabic dialect translation


---

## Training Details

### Training Data

Fine-tuned exclusively on the **Egyptian Arabic** [Omar-youssef/English-Egyptian-Arabic-Translation-Pairs](https://huggingface.co/datasets/Omar-youssef/English-Egyptian-Arabic-Translation-Pairs) dataset.

- **Language filter:** `Egyptian Arabic` only
- **Split:** 80% train / 20% test (`seed=42`)
- **Source language:** English
- **Target language:** Egyptian Arabic

### Training Procedure

#### Training Hyperparameters

- **Learning rate:** 5e-5
- **Batch size:** 16 (per device)
- **Gradient accumulation steps:** 2 (effective batch size = 32)
- **Epochs:** 3
- **Warmup steps:** 500
- **Weight decay:** 0.01
- **Evaluation strategy:** steps (every 50 steps)

---

## Evaluation

### Metrics

| Metric | Description | Range | Better When |
|--------|-------------|-------|-------------|
| **BLEU** | N-gram overlap between predictions and references | 0 → 1 | Higher ↑ |

### Results

| Metric | Score |
|--------|-------|
| **BLEU** | **0.6985** |

> BLEU of **0.6985** (69.85%) indicates high-quality translations with strong overlap against Egyptian Arabic references.

---


---

## Technical Specifications

### Model Architecture

- **Architecture:** MarianMT (Encoder-Decoder Transformer)
- **Base:** Helsinki-NLP/opus-mt-tc-big-en-ar
- **Parameters:** ~500M
- **Task:** Sequence-to-Sequence Translation

### Compute Infrastructure

- **Hardware:** Kaggle GPU (T4 )
- **Framework:** Hugging Face Transformers 
- **Training time:** ~30-60 minutes

---

## Citation

If you use this model, please cite the base model and dataset:

```bibtex
@misc{omar-youssef-egyptian-translation,
  author    = {Omar Youssef},
  title     = {English to Egyptian Arabic Translation Model},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1}
}
```

---

## Model Card Contact

For questions or feedback, open a discussion on this model's [Hugging Face page](https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1/discussions).