rufaelfekadu
/

diac-transformer-text-only-tashkeela

Model card Files Files and versions

diac-transformer-text-only-tashkeela / README.md

rufaelfekadu's picture

Upload model checkpoint and constants (http)

af1bff0 verified 6 months ago

|

History Blame Contribute Delete

2.22 kB

	---
	language: ['ar']
	tags:
	- diacritization
	- nlp
	- arabic
	- transformer
	metrics:
	- DER
	- WER
	- SER
	---

	# Automatic Restoration of Diacritics for Speech Data Sets

	This is a transformer-baed model for Arabic text diacritization as described [here](https://github.com/rufaelfekadu/Diac.git).

	## Evaluation Results

	### Evaluation on clartts

	DER (Diacritic Error Rate)

	\| Configuration \| With case ending \| Without case ending \|
	\|---\|---\|---\|
	\| Including no diacritic \| 10.33% \| 8.45% \|
	\| Excluding no diacritic \| 12.72% \| 10.33% \|


	WER (Word Error Rate)

	\| Configuration \| With case ending \| Without case ending \|
	\|---\|---\|---\|
	\| Including no diacritic \| 30.16% \| 19.71% \|
	\| Excluding no diacritic \| 29.91% \| 19.60% \|


	## How to Use

	### Installation

	```bash
	git clone https://github.com/rufaelfekadu/diac.git
	cd diac
	pip install -e .
	```

	### Loading the Model

	```python
	from diac.models import DiacritizationModule

	model = DiacritizationModule.from_pretrained(
	"rufaelfekadu/diac-transformer-text-only-tashkeela",
	tokenizer_constants_path="constants/" # Path to constants directory
	)
	```

	### Running Inference

	```python
	# Predict diacritization for a text file
	model.predict_file(
	input_file="path/to/input.txt",
	output_file="path/to/output.txt"
	)

	# Or predict for a single text string
	diacritized_text = model.predict_text("مرحبا بك")
	```

	### Running Evaluation

	To evaluate the model on your own test set:

	1. Run inference to generate predictions:

	```bash
	python inference.py \
	--config configs/<model>.yml \
	--opts \
	DATA.TEST_PATH path/to/test.txt \
	INFERENCE.MODEL_PATH <path_to_checkpoint> \
	INFERENCE.OUTPUT_PATH path/to/predictions.txt
	```

	2. Prepare reference file (if needed):

	```bash
	python src/diac/utils/prep_ref.py \
	--input_file path/to/test.txt \
	-o path/to/output_dir
	```

	3. Calculate metrics (DER, WER, SER):

	```bash
	python src/diac/utils/eval.py \
	-ofp path/to/predictions.txt \
	-tfp path/to/reference.txt \
	--style Fadel
	```

	The evaluation script will output DER, WER, and SER metrics with different configurations:
	- With/without case ending
	- Including/excluding no diacritic