--- license: apache-2.0 language: - ru - en base_model: - Daniil-Domino/trocr-base-ru-dialectic-stackmix pipeline_tag: image-to-text tags: - htr - trocr - image-to-text - transformers --- # Russian Dialectic HTR using TrOCR The [TrOCR-base-ru-dialectic-stackmix](https://huggingface.co/Daniil-Domino/trocr-base-ru-dialectic-stackmix) model was fine-tuned on a dataset of nearly 2456 images containing handwritten Russian dialectic texts. For more information, check out the [GitHub repository](https://github.com/DialecticalHTR/RuDialect-HTR). ## Model description TrOCR-base-ru-dialectic-stackmix was fine-tuned for Handwritten Russian Text Recognition in dialectological cards. The model was trained for 10 epochs with a batch size of 4 using an NVIDIA P100 GPU. The fine-tuning process took approximately 35 minutes. ### What is a dialectological text? Linguists at NaRFU go on dialectological expeditions to different villages of Arkhangelsk region. The dialogs with locals are transcribed into notebooks and the examples of a dialect words and an example of its usage is written on cards. The dialectological text is a text that conveys linguistic features using special symbols like acutes, apostrophes etc. Example of a card: ![Example of a card](https://cdn-uploads.huggingface.co/production/uploads/66f94ad1b720048bbc98aeea/OxbyTr2krLmkYUS7I738p.png) # Example Usage ```python # Load libraries from transformers import TrOCRProcessor, VisionEncoderDecoderModel import matplotlib.pyplot as plt from PIL import Image # Load image img_path = 'path/to/image' image = Image.open(img_path).convert("RGB") # Load model and processor model_name = "Daniil-Domino/trocr-base-ru-dialectic" processor = TrOCRProcessor.from_pretrained(model_name) model = VisionEncoderDecoderModel.from_pretrained(model_name) # Preprocess and run inference pixel_values = processor(images=image, return_tensors="pt").pixel_values generated_ids = model.generate(pixel_values) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] # Output result print(generated_text) # Display input image plt.axis("off") plt.imshow(image) plt.show() ``` # Metrics Below are the key evaluation metrics on the validation set: - **CER**: 6.81 % - **WER**: 27.20 % - **Accuracy**: 73.74 %