Translation
Transformers
Safetensors
English
Arabic
marian
text2text-generation
Omar-youssef's picture
Update README.md
4992ca5 verified
|
Raw
History Blame Contribute Delete
4.84 kB
metadata
library_name: transformers
license: apache-2.0
datasets:
  - Omar-youssef/English-Egyptian-Arabic-Translation-Pairs
language:
  - en
  - ar
metrics:
  - bleu
base_model:
  - Helsinki-NLP/opus-mt-tc-big-en-ar
pipeline_tag: translation

English → Egyptian Arabic Translation Model

A fine-tuned version of Helsinki-NLP/opus-mt-tc-big-en-ar specialized for Egyptian Arabic dialect translation.

Model Details

Model Description

This model translates English text into Egyptian Arabic (العامية المصرية), the most widely spoken Arabic dialect. It was fine-tuned on Egyptian Arabic translation pairs, making it more accurate for colloquial Egyptian Arabic compared to the base model which targets Modern Standard Arabic (MSA).


How to Get Started

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1")

def translate(text):
    inputs = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(inputs, num_beams=4, early_stopping=True)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

print(translate("A novel is a long prose narrative that usually describes fictional characters and events in the form of a sequential story."))
# Output: الرواية هي رواية نثرية طويلة عادةً بتصف شخصيات وأحداث خيالية في شكل قصة متتابعة.

print(translate("The rapid advancement of artificial intelligence is transforming many industries, creating new opportunities while also raising important ethical and social concerns."))
# Output: التقدم السريع للـ Artificial Intelligence بيغير صناعات كتير، وبيخلق فرص جديدة وفي نفس الوقت بيثير اهتمامات أخلاقية واجتماعية مهمة.

Uses

Direct Use

Translate English sentences or short paragraphs into Egyptian Arabic dialect. Suitable for:

  • Chatbots targeting Egyptian Arabic speakers
  • Language learning applications
  • Content localization for Egyptian audiences
  • Research on Arabic dialect translation

Training Details

Training Data

Fine-tuned exclusively on the Egyptian Arabic Omar-youssef/English-Egyptian-Arabic-Translation-Pairs dataset.

  • Language filter: Egyptian Arabic only
  • Split: 80% train / 20% test (seed=42)
  • Source language: English
  • Target language: Egyptian Arabic

Training Procedure

Training Hyperparameters

  • Learning rate: 5e-5
  • Batch size: 16 (per device)
  • Gradient accumulation steps: 2 (effective batch size = 32)
  • Epochs: 3
  • Warmup steps: 500
  • Weight decay: 0.01
  • Evaluation strategy: steps (every 50 steps)

Evaluation

Metrics

Metric Description Range Better When
BLEU N-gram overlap between predictions and references 0 → 1 Higher ↑

Results

Metric Score
BLEU 0.6985

BLEU of 0.6985 (69.85%) indicates high-quality translations with strong overlap against Egyptian Arabic references.



Technical Specifications

Model Architecture

  • Architecture: MarianMT (Encoder-Decoder Transformer)
  • Base: Helsinki-NLP/opus-mt-tc-big-en-ar
  • Parameters: ~500M
  • Task: Sequence-to-Sequence Translation

Compute Infrastructure

  • Hardware: Kaggle GPU (T4 )
  • Framework: Hugging Face Transformers
  • Training time: ~30-60 minutes

Citation

If you use this model, please cite the base model and dataset:

@misc{omar-youssef-egyptian-translation,
  author    = {Omar Youssef},
  title     = {English to Egyptian Arabic Translation Model},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1}
}

Model Card Contact

For questions or feedback, open a discussion on this model's Hugging Face page.