---
language:
- en
- uk
license: apache-2.0
library_name: transformers
tags:
- ocr
- icr
- handwriting-recognition
- ukrainian
- ctc
- pytorch
- onnx
pipeline_tag: image-to-text
base_model: DAIR-Group/HTR-ConvText
metrics:
- cer
- wer
---

<div align="center">
# 🇺🇦 Ukrainian OCR / ICR (HTR-ConvText)

**Handwritten & printed text recognition for Ukrainian**

[![Live Demo](https://img.shields.io/badge/🤖_Live_Demo-Open_in_Space-blue?style=for-the-badge&logo=huggingface)](https://huggingface.co/spaces/Valerii02/ukr-htr-convtext-demo)

*Upload an image → Get recognized text*

[English](#quickstart) · [Українська](README.uk.md)
</div>

<div align="left" style="line-height: 1.6">

## 📋 Table of Contents
- [✨ Highlights](#highlights)
- [🚀 Quickstart](#quickstart)
- [📖 Model Description](#model-description)
- [🖼️ Recognition Examples](#recognition-examples)
- [🛠️ Tools & Scripts](#tools--scripts)
- [📊 Evaluation](#evaluation)
- [🙏 Attribution & Citation](#attribution--citation)
## ✨ Highlights
| Feature | Description |
|---------|-------------|
| **Language** | Ukrainian (handwritten + printed) |
| **Architecture** | HTR-ConvText (ResNet-18 + MobileViT), CTC decoding |
| **Input** | 64×3072 px, grayscale line images |
| **Training** | 1.7M samples, SAM, EMA, scan simulation |
| **Formats** | PyTorch, ONNX, Hugging Face `AutoModel` |
## 🚀 Quickstart

```python
from transformers import AutoModel, AutoProcessor

processor = AutoProcessor.from_pretrained("Valerii02/ukr-htr-convtext", trust_remote_code=True)
model = AutoModel.from_pretrained("Valerii02/ukr-htr-convtext", trust_remote_code=True)
```

```python
inputs = processor(images="sample.png", return_tensors="pt")
logits = model(**inputs).logits
text = processor.batch_decode(logits)[0]
print(text)
```

> 💡 **Try it now:** [Open the Gradio demo](https://huggingface.co/spaces/Valerii02/ukr-htr-convtext-demo) — no code required!
## 📖 Model Description

This repository packages a **Ukrainian OCR/ICR model** for handwritten and partially printed text with a Hugging Face–native API (`AutoModel` + `AutoProcessor`).
### Architecture

- **Backbone:** ResNet-18 + MobileViT (MVP), hierarchical ConvText encoder (U-Net-like down/upsampling)
- **Decoding:** CTC greedy
- **Vocabulary:** 151 characters (Ukrainian + symbols)
### Training Data
| Source | Samples |
|--------|---------|
| [ukrainian-handwriting-synth](https://github.com/ValeriiSielikhov/ukrainian-handwriting-synth) | Synthetic handwritten lines |
| [Ukrainian Handwritten Text](https://www.kaggle.com/datasets/annyhnatiuk/ukrainian-handwritten-text) | ~37k segmented lines |
| **Total** | **1,696,499** (Train 90% / Val 5% / Test 5%) |
### Training
- 500k iterations, batch 16 + grad accum 4
- SAM optimizer, EMA (decay 0.9999), TCM warmup 40k iters
- Scan simulation & detector-error augmentations
- Hardware: NVIDIA B200 (180GB VRAM)
## 🖼️ Recognition Examples
| Example | Image | GT | Prediction | CER | WER |
|---------|-------|----|------------|-----|-----|
| 1 | ![example_1](images/example_1.png) | Департаменту патрульної поліції | Департаменту нагрульної поліції | 0.065 | 0.33 |
| 2 | ![example_2](images/example_2.png) | за порушення правил дорожнього руху | за порушення правил дорожнього Дуку | 0.057 | 0.20 |

*Real-world inference on scanned Ukrainian documents. GT = ground truth.*
## 🛠️ Tools & Scripts
| File | Purpose |
|------|---------|
| `prepare_hf_artifacts.py` | Convert `.pth` checkpoint → HF artifacts |
| `export_onnx.py` | Export to ONNX |
| `validate_parity.py` | OpenCV vs PIL, PyTorch vs ONNX parity checks |
| `predict.py` | Single-image CLI inference |
### Conversion
```bash
python prepare_hf_artifacts.py \
  --checkpoint-path /path/to/best_CER.pth \
  --alphabet-path /path/to/alphabet.json \
  --output-dir ./release
```
### ONNX Export
```bash
python export_onnx.py --hf-model-dir ./release --output-dir ./onnx
```
## 📊 Evaluation
| Split | CER | WER | Notes |
|---|---:|---:|---|
| real-world (124) | 0.176 | 0.440 | Scanned docs, handwritten + printed |

*Micro-averaging, `format_string_for_wer` normalization.*

### Comparison with other systems

On the same 124 real-world samples, the finetuned Ukrainian HTR-ConvText model (`ukr-htr-convtext`) was compared against several vision–language and HTR baselines.

| Model                   | Samples | CER (%) | WER (%) |
|-------------------------|:-------:|--------:|--------:|
| mamay                   |   124   | 40.15   | 75.28   |
| finetuned-cyrillic-trocr|   124   | 46.45   | 78.96   |
| cyrillic-trocr          |   124   | 51.92   | 97.93   |
| gpt-4o-mini             |   124   | 56.19   | 88.75   |
| hunyuan                 |   124   | 124.80  | 180.78  |
| **ukr-htr-convtext (Ours)** | **124** | **17.63** | **44.04** |

Across this evaluation set, the proposed `ukr-htr-convtext` model more than halves the character error rate relative to the next best system (mamay) and strongly outperforms generic and domain-adapted VLM/HTR baselines.
## ⚠️ Limitations
- Sensitive to severe blur, low contrast, non-standard page artifacts
- Performance may drop on long lines far from training distribution
- CTC decoding can fail on highly ambiguous character boundaries
## 🙏 Attribution & Citation
This implementation adapts ideas from [DAIR-Group/HTR-ConvText](https://huggingface.co/DAIR-Group/HTR-ConvText). See `NOTICE` and `CITATION.cff` for details.
**Upstream (HTR-ConvText):**
```bibtex
@misc{truc2025htrconvtext,
  title={HTR-ConvText: Leveraging Convolution and Textual Information for Handwritten Text Recognition},
  author={Pham Thach Thanh Truc and Dang Hoai Nam and Huynh Tong Dang Khoa and Vo Nguyen Le Duy},
  year={2025},
  eprint={2512.05021},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.05021},
}
```
**This model:** See `CITATION.cff` for full attribution.
## 📄 License
Apache-2.0. See `LICENSE`.
</div>
<div align="left">
⭐ Star this repo if you find it useful! · <a href="https://huggingface.co/Valerii02/ukr-htr-convtext/discussions">Report issues</a> · <a href="https://huggingface.co/Valerii02/ukr-htr-convtext">Contributions welcome</a>
</div>