TrOCR Fine-tuned on the MultiHTR Ukrainian Dataset

This model is a fine-tuned version of kazars24/trocr-base-handwritten-ru for recognizing Ukrainian handwritten text. It was trained on the datasets used for the MultiHTR Ukrainian Transkribus models (see Tikhonov & Rabus 2024).

Model Description

TrOCR (Transformer-based OCR) is a vision-to-text model using a ViT encoder and a causal language model decoder. This version is fine-tuned specifically on Ukrainian handwriting from the 19th and 20th centuries.

Base model: kazars24/trocr-base-handwritten-ru
Fine-tuned on: MultiHTR Ukrainian dataset (Achim Rabus, Aleksej Tikhonov et al., University of Freiburg) — the same data used for training the Transkribus models Ukrainian generic handwriting 1 and Ukrainian generic handwritten and typed
Intended use: OCR/HTR of handwritten Ukrainian manuscripts and documents

Key preprocessing choice — aspect-ratio preservation: Line images are resized to 128 px height while preserving aspect ratio, rather than being squashed to 384×384. Ukrainian manuscript lines typically have aspect ratios of 4:1–12:1. Direct resizing to 384×384 causes ~10× width compression; aspect-ratio preservation maintains character resolution (from ~7 px to ~28 px character width).

Training Data

Sources: Vernadskyi National Library of Ukraine (manuscripts by Taras Shevchenko and Klyment Kvitka); National Museum of the Holodomor-Genocide in Kyiv; GRAC corpus (Lviv Polytechnic University); Prozhito Project; Foundation of the International Memorial Association; for more detailed acknowledgements, see the links above
Scope: Ukrainian handwritten manuscripts and documents, primarily 19th–20th century
Size: 19,307 training lines, 4,827 validation lines (773 pages)
Preprocessing: resize to 128 px height, aspect ratio preserved (LANCZOS); no background normalization
Train / Val split: 80% / 20%

Performance

Metric	Value
CER (validation)	9.94%

Training Details

Parameter	Value
Base model	`kazars24/trocr-base-handwritten-ru`
Optimizer	Adafactor
Learning rate	4e-5
Effective batch size	96
Epochs	20
FP16	Yes
Augmentation	Rotation ±2°, brightness/contrast ±0.3
Framework	HuggingFace Transformers, Seq2SeqTrainer

How to Use

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

processor = TrOCRProcessor.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")
model = VisionEncoderDecoderModel.from_pretrained("cyrillic-trocr/trocr-ukrainian-handwritten")

image = Image.open("line_image.png").convert("RGB")
pixel_values = processor(images=image, return_tensors="pt").pixel_values

generated_ids = model.generate(pixel_values)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)

Acknowledgements

Funded by the Ministry of Science, Research and the Arts of Baden-Württemberg as part of the digital@bw digitization strategy.

Citation

If you use this model, please cite:

@article{tikhonov_rabus_2024,
  author    = {Tikhonov, Aleksej and Rabus, Achim},
  title     = {Handwritten Text Recognition of Ukrainian Manuscripts in the 21st Century:
               Possibilities, Challenges, and the Future of the First Generic AI-based Model},
  journal   = {Kyiv-Mohyla Humanities Journal},
  volume    = {11},
  year      = {2024},
  pages     = {226--247},
  doi       = {10.18523/2313-4895.11.2024.226-247}
}

MultiHTR project funded by the Ministry of Science, Research and the Arts of Baden-Württemberg (digital@bw).

TrOCR architecture:

@article{li2021trocr,
  title   = {TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
  author  = {Li, Minghao and Lv, Tengchao and Chen, Jingye and Cui, Lei and Lu, Yijuan and
             Florencio, Dinei and Zhang, Cha and Li, Zhoujun and Wei, Furu},
  journal = {arXiv preprint arXiv:2109.10282},
  year    = {2021}
}

Downloads last month: 1,079

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cyrillic-trocr/trocr-ukrainian-handwritten

Base model

microsoft/trocr-base-handwritten

Finetuned

kazars24/trocr-base-handwritten-ru

Finetuned

(3)

this model

Paper for cyrillic-trocr/trocr-ukrainian-handwritten

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Paper • 2109.10282 • Published Sep 21, 2021 • 13