---
license: mit
language:
- ar
metrics:
- f1
base_model:
- CAMeL-Lab/readability-arabertv2-d3tok-reg
tags:
- Arabic
---

#  MorphoArabia at BAREC 2025 Shared Task: A Hybrid Architecture with Morphological Analysis for Arabic Readability Assessment

<p align="center">
<img src="https://placehold.co/800x200/dbeafe/3b82f6?text=Barec-Readability-Assessment" alt="Barec Readability Assessment">
</p>

This repository contains the official models and results for **MorphoArabia**, the submission to the **[BAREC 2025 Shared Task](https://sites.google.com/view/barec-2025/home)** on Arabic Readability Assessment.

#### By: [Fatimah Mohamed Emad Elden](https://scholar.google.com/citations?user=CfX6eA8AAAAJ&hl=ar)

#### *Cairo University*


[![Paper](https://img.shields.io/badge/arXiv-25XX.XXXXX-b31b1b.svg)](https://arxiv.org/abs/25XX.XXXXX)
[![Code](https://img.shields.io/badge/GitHub-Code-blue)](https://github.com/astral-fate/barec-Arabic-Readability-Assessment)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-Page-F9D371)](https://huggingface.co/collections/FatimahEmadEldin/barec-shared-task-2025-689195853f581b9a60f9bd6c)
[![License](https://img.shields.io/badge/License-MIT-lightgrey)](https://github.com/astral-fate/mentalqa2025/blob/main/LICENSE)

---

## Model Description

This project introduces a **morphologically-aware approach** for assessing the readability of Arabic text. The system is built around a fine-tuned regression model designed to process morphologically analyzed text. For the **Constrained** and **Open** tracks of the shared task, this core model is extended into a hybrid architecture that incorporates seven engineered lexical features.

A key element of this system is its deep morphological preprocessing pipeline. For the base models, this involves using the **CAMEL Tools `d3tok` analyzer** to capture linguistic complexities often missed by surface-level tokenization. This approach proved to be highly effective, achieving a peak **Quadratic Weighted Kappa (QWK) score of 84.2** on the strict sentence-level test set.

The model predicts a readability score on a **19-level scale**, from 1 (easiest) to 19 (hardest), for a given Arabic sentence or document.

-----

## 🚀 How to Use the Hybrid Model

This repository contains a fine-tuned hybrid model that combines a transformer's text understanding with explicit lexical features for a more robust readability assessment.

**NOTE:** This is a custom model architecture. You **must** use the `trust_remote_code=True` argument when loading it.

### Step 1: Installation
First, install all the necessary libraries. You will need `arabert` for the specific preprocessing steps this model requires.

```bash
pip install transformers torch pandas arabert
````

### Step 2: Preprocessing and Feature Engineering

To use the model correctly, you must replicate the same preprocessing and feature engineering steps used during training. The input text is first cleaned using the `AraBERT` preprocessor. Then, 7 lexical features are extracted based on the **SAMER Readability Lexicon**.

The 7 features are:

  * **Character Count**: The total number of characters in the preprocessed text.
  * **Word Count**: The total number of words in the text.
  * **Average Word Length**: The average number of characters per word.
  * **Average Word Difficulty**: The mean readability score of all words, based on the SAMER lexicon (defaulting to 3.0 for unknown words).
  * **Maximum Word Difficulty**: The highest readability score of any single word in the text.
  * **Difficult Word Count**: The number of words with a readability score greater than 4.
  * **OOV Ratio**: The ratio of words in the text that are not found in the SAMER lexicon.

### Step 3: Full Inference Example

The following code provides a complete, runnable example for getting a prediction from a single sentence. It includes the necessary preprocessing functions.

```python
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel
from arabert.preprocess import ArabertPreprocessor

# --- 1. Define the Feature Engineering Function ---
# This function must be defined to process your text.
def get_lexical_features(text, lexicon):
    """Calculates the 7 lexical features based on the SAMER lexicon."""
    if not lexicon or not isinstance(text, str):
        return [0.0] * 7

    words = text.split()
    if not words: return [0.0] * 7

    # Default difficulty for words not in the lexicon is 3.0
    word_difficulties = [lexicon.get(word, 3.0) for word in words]
    
    features = [
        float(len(text)),
        float(len(words)),
        float(np.mean([len(w) for w in words]) if words else 0.0),
        float(np.mean(word_difficulties)),
        float(np.max(word_difficulties)),
        float(np.sum(np.array(word_difficulties) > 4)),
        float(len([w for w in words if w not in lexicon]) / len(words))
    ]
    return features

# --- 2. Initialize Models and Processors ---
repo_id = "FatimahEmadEldin/Constrained-Track-Sentence-Bassline-Readability-Arabertv2-d3tok-reg"
arabert_preprocessor = ArabertPreprocessor(model_name="aubmindlab/bert-large-arabertv2")
tokenizer = AutoTokenizer.from_pretrained(repo_id)
# Load the model with trust_remote_code=True to use the custom HybridRegressionModel
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# --- 3. Prepare Input Text and Lexicon ---
# NOTE: For this example, we use a small, sample lexicon. For best results,
# you should load the full 'SAMER-Readability-Lexicon-v2.tsv' file.
sample_lexicon = {'جملة': 2.5, 'عربية': 3.1, 'بسيطة': 1.8}
text = "هذا مثال لجملة عربية بسيطة."

# --- 4. Run the Full Pipeline ---
# a. Preprocess the text
preprocessed_text = arabert_preprocessor.preprocess(text)

# b. Extract the 7 lexical features
numerical_features_list = get_lexical_features(preprocessed_text, sample_lexicon)
numerical_features = torch.tensor([numerical_features_list], dtype=torch.float)

# c. Tokenize the text
inputs = tokenizer(preprocessed_text, return_tensors="pt", padding=True, truncation=True)

# d. Add numerical features to the model's inputs
inputs['features'] = numerical_features

# --- 5. Perform Inference ---
model.eval() # Set the model to evaluation mode
with torch.no_grad():
    logits = model(**inputs)

# --- 6. Process the Output ---
predicted_score = logits.item()
# Clip the score to the valid 0-18 range, then shift to the 1-19 final level
final_level = round(max(0, min(18, predicted_score))) + 1

print(f"Input Text: '{text}'")
print(f"Preprocessed Text: '{preprocessed_text}'")
print(f"Extracted Features: {numerical_features_list}")
print("-" * 30)
print(f"Raw Regression Score: {predicted_score:.4f}")
print(f"Predicted Readability Level (1-19): {final_level}")
```

## ⚙️ Training Procedure

The system employs two distinct architectures based on the track's constraints:

  * **Strict Track**: This track uses a base regression model, `CAMeL-Lab/readability-arabertv2-d3tok-reg`, fine-tuned directly on the BAREC dataset.
  * **Constrained and Open Tracks**: These tracks utilize a hybrid model. This architecture combines the deep contextual understanding of the Transformer with explicit numerical features. The final representation for a sentence is created by concatenating the Transformer's `[CLS]` token embedding with a 7-dimensional vector of engineered lexical features derived from the SAMER lexicon.

### Data and Hyperparameters

The model was trained on a combined dataset of **97,874 training records** and validated against **7,310 validation records**. The following key hyperparameters were used during training:

  * **Epochs**: 8
  * **Learning Rate**: 3e-5
  * **Evaluation Batch Size**: 64
  * **Warmup Ratio**: 0.1
  * **Weight Decay**: 0.01

-----

### 📊 Evaluation Results

The models were evaluated on the blind test set provided by the BAREC organizers. The primary metric for evaluation is the **Quadratic Weighted Kappa (QWK)**, which penalizes larger disagreements more severely.

#### Final Test Set Scores (QWK)

| Track | Task | Dev (QWK) | Test (QWK) |
| :--- | :--- | :---: | :---: |
| **Strict** | Sentence | 0.823 | **84.2** |
| | Document | 0.823\* | 79.9 |
| **Constrained** | Sentence | 0.810 | 82.9 |
| | Document | 0.835\* | 75.5 |
| **Open** | Sentence | 0.827 | 83.6 |
| | Document | 0.827\* | **79.2** |

*\*Document-level dev scores are based on the performance of the sentence-level model on the validation set.*

-----

## 📜 Citation

If you use the work, please cite the paper:

```
@inproceedings{eldin2025morphoarabia,
    title={{MorphoArabia at BAREC 2025 Shared Task: A Hybrid Architecture with Morphological Analysis for Arabic Readability Assessmen}},
    author={Eldin, Fatimah Mohamed Emad},
    year={2025},
    booktitle={Proceedings of the BAREC 2025 Shared Task},
    eprint={25XX.XXXXX},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```