---
language: ['ar']
tags:
  - diacritization
  - nlp
  - arabic
metrics:
  - DER
  - WER
  - SER
model_type: to_push
datasets:
  - transformer-text-only
---

# Model Card

This is a To_Push model for Arabic text diacritization.

## Model Details

**Source checkpoint:** `/home/rufael/Projects/forced_alignment/Diac/results/to_push/transformer-text-only/tashkeela/tensorboard/version_0/checkpoints/best_model.ckpt`

**Included config files:**
- `/home/rufael/Projects/forced_alignment/Diac/results/to_push/transformer-text-only/tashkeela/tensorboard/version_0/hparams.yaml`

## Evaluation Results

### Evaluation on clartts

### DER (Diacritic Error Rate)

| Configuration | With case ending | Without case ending |
|---|---|---|
| **Including no diacritic** | 10.33% | 8.45% |
| **Excluding no diacritic** | 12.72% | 10.33% |


### WER (Word Error Rate)

| Configuration | With case ending | Without case ending |
|---|---|---|
| **Including no diacritic** | 30.16% | 19.71% |
| **Excluding no diacritic** | 29.91% | 19.60% |


### SER (Sentence Error Rate)

| Configuration | With case ending | Without case ending |
|---|---|---|
| **Including no diacritic** | 91.62% | 79.34% |
| **Excluding no diacritic** | 91.62% | 79.34% |


## How to Use

### Installation

```bash
pip install torch
```

### Loading the Model

```python
from diac.models import DiacritizationModule

model = DiacritizationModule.from_pretrained(
    "rufaelfekadu/diac-transformer-text-only-tashkeela",
    tokenizer_constants_path="constants/"  # Path to constants directory
)
```

### Running Inference

```python
# Predict diacritization for a text file
model.predict_file(
    input_file="path/to/input.txt",
    output_file="path/to/output.txt"
)

# Or predict for a single text string
diacritized_text = model.predict_text("مرحبا بك")
```

### Running Evaluation

To evaluate the model on your own test set:

1. **Run inference** to generate predictions:

```bash
python inference.py \
    --config configs/<model>.yml \
    --opts \
    DATA.TEST_PATH path/to/test.txt \
    INFERENCE.MODEL_PATH <path_to_checkpoint> \
    INFERENCE.OUTPUT_PATH path/to/predictions.txt
```

2. **Prepare reference file** (if needed):

```bash
python src/diac/utils/prep_ref.py \
    --input_file path/to/test.txt \
    -o path/to/output_dir
```

3. **Calculate metrics** (DER, WER, SER):

```bash
python src/diac/utils/eval.py \
    -ofp path/to/predictions.txt \
    -tfp path/to/reference.txt \
    --style Fadel
```

The evaluation script will output DER, WER, and SER metrics with different configurations:
- With/without case ending
- Including/excluding no diacritic