Instructions to use Omar-youssef/english-egyptian-arabic-translation-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Omar-youssef/english-egyptian-arabic-translation-v1 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Omar-youssef/english-egyptian-arabic-translation-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1") model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")English → Egyptian Arabic Translation Model
A fine-tuned version of Helsinki-NLP/opus-mt-tc-big-en-ar specialized for Egyptian Arabic dialect translation.
Model Details
Model Description
This model translates English text into Egyptian Arabic (العامية المصرية), the most widely spoken Arabic dialect. It was fine-tuned on Egyptian Arabic translation pairs, making it more accurate for colloquial Egyptian Arabic compared to the base model which targets Modern Standard Arabic (MSA).
- Model type: Seq2Seq Translation (MarianMT)
- Language(s): English (
en) → Egyptian Arabic (ar) - License: Apache 2.0
- Base model: Helsinki-NLP/opus-mt-tc-big-en-ar
- Fine-tuned on: Omar-youssef/English-Egyptian-Arabic-Translation-Pairs
How to Get Started
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")
def translate(text):
inputs = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(inputs, num_beams=4, early_stopping=True)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translate("A novel is a long prose narrative that usually describes fictional characters and events in the form of a sequential story."))
# Output: الرواية هي رواية نثرية طويلة عادةً بتصف شخصيات وأحداث خيالية في شكل قصة متتابعة.
print(translate("The rapid advancement of artificial intelligence is transforming many industries, creating new opportunities while also raising important ethical and social concerns."))
# Output: التقدم السريع للـ Artificial Intelligence بيغير صناعات كتير، وبيخلق فرص جديدة وفي نفس الوقت بيثير اهتمامات أخلاقية واجتماعية مهمة.
Uses
Direct Use
Translate English sentences or short paragraphs into Egyptian Arabic dialect. Suitable for:
- Chatbots targeting Egyptian Arabic speakers
- Language learning applications
- Content localization for Egyptian audiences
- Research on Arabic dialect translation
Training Details
Training Data
Fine-tuned exclusively on the Egyptian Arabic Omar-youssef/English-Egyptian-Arabic-Translation-Pairs dataset.
- Language filter:
Egyptian Arabiconly - Split: 80% train / 20% test (
seed=42) - Source language: English
- Target language: Egyptian Arabic
Training Procedure
Training Hyperparameters
- Learning rate: 5e-5
- Batch size: 16 (per device)
- Gradient accumulation steps: 2 (effective batch size = 32)
- Epochs: 3
- Warmup steps: 500
- Weight decay: 0.01
- Evaluation strategy: steps (every 50 steps)
Evaluation
Metrics
| Metric | Description | Range | Better When |
|---|---|---|---|
| BLEU | N-gram overlap between predictions and references | 0 → 1 | Higher ↑ |
Results
| Metric | Score |
|---|---|
| BLEU | 0.6985 |
BLEU of 0.6985 (69.85%) indicates high-quality translations with strong overlap against Egyptian Arabic references.
Technical Specifications
Model Architecture
- Architecture: MarianMT (Encoder-Decoder Transformer)
- Base: Helsinki-NLP/opus-mt-tc-big-en-ar
- Parameters: ~500M
- Task: Sequence-to-Sequence Translation
Compute Infrastructure
- Hardware: Kaggle GPU (T4 )
- Framework: Hugging Face Transformers
- Training time: ~30-60 minutes
Citation
If you use this model, please cite the base model and dataset:
@misc{omar-youssef-egyptian-translation,
author = {Omar Youssef},
title = {English to Egyptian Arabic Translation Model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1}
}
Model Card Contact
For questions or feedback, open a discussion on this model's Hugging Face page.
- Downloads last month
- 80
Model tree for Omar-youssef/english-egyptian-arabic-translation-v1
Base model
Helsinki-NLP/opus-mt-tc-big-en-ar
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Omar-youssef/english-egyptian-arabic-translation-v1")