TrOCR Fine-tuned on the MultiHTR Ukrainian Dataset
This model is a fine-tuned version of kazars24/trocr-base-handwritten-ru for recognizing Ukrainian handwritten text. It was trained on the datasets used for the MultiHTR Ukrainian Transkribus models (see Tikhonov & Rabus 2024).
Model Description
TrOCR (Transformer-based OCR) is a vision-to-text model using a ViT encoder and a causal language model decoder. This version is fine-tuned specifically on Ukrainian handwriting from the 19th and 20th centuries.
- Base model: kazars24/trocr-base-handwritten-ru
- Fine-tuned on: MultiHTR Ukrainian dataset (Achim Rabus, Aleksej Tikhonov et al., University of Freiburg) — the same data used for training the Transkribus models Ukrainian generic handwriting 1 and Ukrainian generic handwritten and typed
- Intended use: OCR/HTR of handwritten Ukrainian manuscripts and documents
Key preprocessing choice — aspect-ratio preservation: Line images are resized to 128 px height while preserving aspect ratio, rather than being squashed to 384×384. Ukrainian manuscript lines typically have aspect ratios of 4:1–12:1. Direct resizing to 384×384 causes ~10× width compression; aspect-ratio preservation maintains character resolution (from ~7 px to ~28 px character width).
Training Data
- Sources: Vernadskyi National Library of Ukraine (manuscripts by Taras Shevchenko and Klyment Kvitka); National Museum of the Holodomor-Genocide in Kyiv; GRAC corpus (Lviv Polytechnic University); Prozhito Project; Foundation of the International Memorial Association; for more detailed acknowledgements, see the links above
- Scope: Ukrainian handwritten manuscripts and documents, primarily 19th–20th century
- Size: 19,307 training lines, 4,827 validation lines (773 pages)
- Preprocessing: resize to 128 px height, aspect ratio preserved (LANCZOS); no background normalization
- Train / Val split: 80% / 20%
Performance
| Metric | Value |
|---|---|
| CER (validation) | 9.94% |
Training Details
| Parameter | Value |
|---|---|
| Base model | kazars24/trocr-base-handwritten-ru |
| Optimizer | Adafactor |
| Learning rate | 4e-5 |
| Effective batch size | 96 |
| Epochs | 20 |
| FP16 | Yes |
| Augmentation | Rotation ±2°, brightness/contrast ±0.3 |
| Framework | HuggingFace Transformers, Seq2SeqTrainer |
How to Use
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
processor = TrOCRProcessor.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")
image = Image.open("line_image.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
Acknowledgements
Funded by the Ministry of Science, Research and the Arts of Baden-Württemberg as part of the digital@bw digitization strategy.
Citation
If you use this model, please cite:
@article{tikhonov_rabus_2024,
author = {Tikhonov, Aleksej and Rabus, Achim},
title = {Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century:
Possibilities, Challenges, and the Future of the First Generic AI-based Model},
journal = {Kyiv-Mohyla Humanities Journal},
volume = {11},
year = {2024},
pages = {226--247},
doi = {10.18523/2313-4895.11.2024.226-247}
}
MultiHTR project funded by the Ministry of Science, Research and the Arts of Baden-Württemberg (digital@bw).
TrOCR architecture:
@article{li2021trocr,
title = {TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author = {Li, Minghao and Lv, Tengchao and Chen, Jingye and Cui, Lei and Lu, Yijuan and
Florencio, Dinei and Zhang, Cha and Li, Zhoujun and Wei, Furu},
journal = {arXiv preprint arXiv:2109.10282},
year = {2021}
}
- Downloads last month
- 1,079
Model tree for cyrillic-trocr/trocr-ukrainian-handwritten
Base model
microsoft/trocr-base-handwritten