--- language: ['ar'] tags: - diacritization - nlp - arabic metrics: - DER - WER - SER model_type: to_push datasets: - transformer-text-only --- # Model Card This is a To_Push model for Arabic text diacritization. ## Model Details **Source checkpoint:** `/home/rufael/Projects/forced_alignment/Diac/results/to_push/transformer-text-only/tashkeela/tensorboard/version_0/checkpoints/best_model.ckpt` **Included config files:** - `/home/rufael/Projects/forced_alignment/Diac/results/to_push/transformer-text-only/tashkeela/tensorboard/version_0/hparams.yaml` ## Evaluation Results ### Evaluation on clartts ### DER (Diacritic Error Rate) | Configuration | With case ending | Without case ending | |---|---|---| | **Including no diacritic** | 10.33% | 8.45% | | **Excluding no diacritic** | 12.72% | 10.33% | ### WER (Word Error Rate) | Configuration | With case ending | Without case ending | |---|---|---| | **Including no diacritic** | 30.16% | 19.71% | | **Excluding no diacritic** | 29.91% | 19.60% | ### SER (Sentence Error Rate) | Configuration | With case ending | Without case ending | |---|---|---| | **Including no diacritic** | 91.62% | 79.34% | | **Excluding no diacritic** | 91.62% | 79.34% | ## How to Use ### Installation ```bash pip install torch ``` ### Loading the Model ```python from diac.models import DiacritizationModule model = DiacritizationModule.from_pretrained( "rufaelfekadu/diac-transformer-text-only-tashkeela", tokenizer_constants_path="constants/" # Path to constants directory ) ``` ### Running Inference ```python # Predict diacritization for a text file model.predict_file( input_file="path/to/input.txt", output_file="path/to/output.txt" ) # Or predict for a single text string diacritized_text = model.predict_text("مرحبا بك") ``` ### Running Evaluation To evaluate the model on your own test set: 1. **Run inference** to generate predictions: ```bash python inference.py \ --config configs/.yml \ --opts \ DATA.TEST_PATH path/to/test.txt \ INFERENCE.MODEL_PATH \ INFERENCE.OUTPUT_PATH path/to/predictions.txt ``` 2. **Prepare reference file** (if needed): ```bash python src/diac/utils/prep_ref.py \ --input_file path/to/test.txt \ -o path/to/output_dir ``` 3. **Calculate metrics** (DER, WER, SER): ```bash python src/diac/utils/eval.py \ -ofp path/to/predictions.txt \ -tfp path/to/reference.txt \ --style Fadel ``` The evaluation script will output DER, WER, and SER metrics with different configurations: - With/without case ending - Including/excluding no diacritic