--- library_name: transformers license: apache-2.0 datasets: - Omar-youssef/English-Egyptian-Arabic-Translation-Pairs language: - en - ar metrics: - bleu base_model: - Helsinki-NLP/opus-mt-tc-big-en-ar pipeline_tag: translation --- # English → Egyptian Arabic Translation Model A fine-tuned version of [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar) specialized for **Egyptian Arabic dialect** translation. ## Model Details ### Model Description This model translates English text into **Egyptian Arabic (العامية المصرية)**, the most widely spoken Arabic dialect. It was fine-tuned on Egyptian Arabic translation pairs, making it more accurate for colloquial Egyptian Arabic compared to the base model which targets Modern Standard Arabic (MSA). - **Model type:** Seq2Seq Translation (MarianMT) - **Language(s):** English (`en`) → Egyptian Arabic (`ar`) - **License:** Apache 2.0 - **Base model:** [Helsinki-NLP/opus-mt-tc-big-en-ar](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-ar) - **Fine-tuned on:** [Omar-youssef/English-Egyptian-Arabic-Translation-Pairs](https://huggingface.co/datasets/Omar-youssef/English-Egyptian-Arabic-Translation-Pairs) --- ## How to Get Started ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1") model = AutoModelForSeq2SeqLM.from_pretrained("Omar-youssef/english-egyptian-arabic-translation-v1") def translate(text): inputs = tokenizer.encode(text, return_tensors="pt") outputs = model.generate(inputs, num_beams=4, early_stopping=True) return tokenizer.decode(outputs[0], skip_special_tokens=True) print(translate("A novel is a long prose narrative that usually describes fictional characters and events in the form of a sequential story.")) # Output: الرواية هي رواية نثرية طويلة عادةً بتصف شخصيات وأحداث خيالية في شكل قصة متتابعة. print(translate("The rapid advancement of artificial intelligence is transforming many industries, creating new opportunities while also raising important ethical and social concerns.")) # Output: التقدم السريع للـ Artificial Intelligence بيغير صناعات كتير، وبيخلق فرص جديدة وفي نفس الوقت بيثير اهتمامات أخلاقية واجتماعية مهمة. ``` --- ## Uses ### Direct Use Translate English sentences or short paragraphs into Egyptian Arabic dialect. Suitable for: - Chatbots targeting Egyptian Arabic speakers - Language learning applications - Content localization for Egyptian audiences - Research on Arabic dialect translation --- ## Training Details ### Training Data Fine-tuned exclusively on the **Egyptian Arabic** [Omar-youssef/English-Egyptian-Arabic-Translation-Pairs](https://huggingface.co/datasets/Omar-youssef/English-Egyptian-Arabic-Translation-Pairs) dataset. - **Language filter:** `Egyptian Arabic` only - **Split:** 80% train / 20% test (`seed=42`) - **Source language:** English - **Target language:** Egyptian Arabic ### Training Procedure #### Training Hyperparameters - **Learning rate:** 5e-5 - **Batch size:** 16 (per device) - **Gradient accumulation steps:** 2 (effective batch size = 32) - **Epochs:** 3 - **Warmup steps:** 500 - **Weight decay:** 0.01 - **Evaluation strategy:** steps (every 50 steps) --- ## Evaluation ### Metrics | Metric | Description | Range | Better When | |--------|-------------|-------|-------------| | **BLEU** | N-gram overlap between predictions and references | 0 → 1 | Higher ↑ | ### Results | Metric | Score | |--------|-------| | **BLEU** | **0.6985** | > BLEU of **0.6985** (69.85%) indicates high-quality translations with strong overlap against Egyptian Arabic references. --- --- ## Technical Specifications ### Model Architecture - **Architecture:** MarianMT (Encoder-Decoder Transformer) - **Base:** Helsinki-NLP/opus-mt-tc-big-en-ar - **Parameters:** ~500M - **Task:** Sequence-to-Sequence Translation ### Compute Infrastructure - **Hardware:** Kaggle GPU (T4 ) - **Framework:** Hugging Face Transformers - **Training time:** ~30-60 minutes --- ## Citation If you use this model, please cite the base model and dataset: ```bibtex @misc{omar-youssef-egyptian-translation, author = {Omar Youssef}, title = {English to Egyptian Arabic Translation Model}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1} } ``` --- ## Model Card Contact For questions or feedback, open a discussion on this model's [Hugging Face page](https://huggingface.co/Omar-youssef/english-egyptian-arabic-translation-v1/discussions).