Image-Text-to-Text
PEFT
Ukrainian
lora
qwen3-vl
document-analysis
handwriting-recognition
htr
ukrainian
image-to-text
Instructions to use AlexandreSheva/rukopys-yolo11m-detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AlexandreSheva/rukopys-yolo11m-detector with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -18,103 +18,33 @@ tags:
|
|
| 18 |
- image-to-text
|
| 19 |
---
|
| 20 |
|
| 21 |
-
# RUKOPYS
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
`AlexandreSheva/rukopys-yolo11m-detector` contains a PEFT/LoRA adapter, not a standalone model. Load it on top of
|
| 26 |
-
[`Qwen/Qwen3-VL-8B-Instruct`](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) to run page-level Ukrainian handwriting
|
| 27 |
-
recognition and document-structure extraction.
|
| 28 |
|
| 29 |
## What It Does
|
| 30 |
|
| 31 |
-
-
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
|
| 35 |
-
- Fits into the RUKOPYS pipeline as the page-level vision-language model.
|
| 36 |
-
|
| 37 |
-
## Release Positioning
|
| 38 |
-
|
| 39 |
-
Baseline public 8B page adapter for the RUKOPYS HTR pipeline.
|
| 40 |
-
|
| 41 |
-
The adapter is intended for experimentation, portfolio review, and reproducible HTR pipeline
|
| 42 |
-
development. For production use, validate on your own scans because handwriting style, scan quality,
|
| 43 |
-
page layout, and annotation source can shift model behavior.
|
| 44 |
-
|
| 45 |
-
## Training Data
|
| 46 |
-
|
| 47 |
-
Trained on the curated RUKOPYS MVP dataset:
|
| 48 |
-
[`your-hf-username-or-org/rukopys-curated-mvp`](https://huggingface.co/datasets/your-hf-username-or-org/rukopys-curated-mvp).
|
| 49 |
-
|
| 50 |
-
The dataset is a cleaned derivative of `UkrainianCatholicUniversity/rukopys` prepared for:
|
| 51 |
-
|
| 52 |
-
- page-to-regions JSON supervised fine-tuning,
|
| 53 |
-
- crop-level text transcription fine-tuning,
|
| 54 |
-
- layout detection experiments,
|
| 55 |
-
- repeatable Kaggle-style evaluation and submission generation.
|
| 56 |
-
|
| 57 |
-
## Training Setup
|
| 58 |
-
|
| 59 |
-
- Base model: `Qwen/Qwen3-VL-8B-Instruct`
|
| 60 |
-
- Method: 4-bit QLoRA / PEFT LoRA adapter fine-tuning
|
| 61 |
-
- LoRA rank: `not recorded`
|
| 62 |
-
- LoRA alpha: `not recorded`
|
| 63 |
-
- Max steps: `not recorded`
|
| 64 |
-
- Learning rate: `not recorded`
|
| 65 |
-
- Per-device batch size: `not recorded`
|
| 66 |
-
- Gradient accumulation steps: `not recorded`
|
| 67 |
-
- Effective batch size: `not recorded`
|
| 68 |
-
- Max sequence length: `not recorded`
|
| 69 |
-
- Max image pixels: `not recorded`
|
| 70 |
-
- Minimum quality weight: `not recorded`
|
| 71 |
-
- Weighted sampling: `not recorded`
|
| 72 |
-
- Training examples used: `not recorded`
|
| 73 |
-
- Evaluation examples held out: `not recorded`
|
| 74 |
|
| 75 |
## Quick Use
|
| 76 |
|
| 77 |
```python
|
| 78 |
-
from
|
| 79 |
-
from
|
| 80 |
-
|
| 81 |
-
base_model_id = "Qwen/Qwen3-VL-8B-Instruct"
|
| 82 |
-
adapter_id = "AlexandreSheva/rukopys-yolo11m-detector"
|
| 83 |
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
device_map="auto",
|
| 88 |
-
torch_dtype="auto",
|
| 89 |
)
|
| 90 |
-
model =
|
| 91 |
-
model.
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
Use the project inference CLI for end-to-end page prediction and Kaggle submission generation.
|
| 95 |
-
|
| 96 |
-
## Output Format
|
| 97 |
-
|
| 98 |
-
The expected assistant response is JSON compatible with the RUKOPYS page schema:
|
| 99 |
-
|
| 100 |
-
```json
|
| 101 |
-
[
|
| 102 |
-
{
|
| 103 |
-
"bbox": [10, 20, 300, 80],
|
| 104 |
-
"type": "handwritten",
|
| 105 |
-
"language": "uk",
|
| 106 |
-
"text": "..."
|
| 107 |
-
}
|
| 108 |
-
]
|
| 109 |
```
|
| 110 |
|
| 111 |
## Limitations
|
| 112 |
-
|
| 113 |
-
- The adapter was trained for Ukrainian handwriting and may not generalize to other languages.
|
| 114 |
-
- It is sensitive to page resolution and preprocessing; match the training pixel budget when
|
| 115 |
-
possible.
|
| 116 |
-
- Bounding boxes and text should be evaluated together, not as independent OCR text only.
|
| 117 |
-
- The training dataset inherits a non-commercial CC BY-NC-SA 4.0 license from the source data.
|
| 118 |
|
| 119 |
## Project Context
|
| 120 |
|
|
|
|
| 18 |
- image-to-text
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# RUKOPYS YOLO 11M Handwriting Region Detector
|
| 22 |
|
| 23 |
+
`AlexandreSheva/rukopys-yolo11m-detector` contains an Ultralytics YOLO 11M detector trained to localize handwritten regions in RUKOPYS manuscript page images. It is the layout-detection component of the RUKOPYS HTR pipeline and is intended to produce bounding boxes that can be passed to a recognizer or combined with page-level vision-language predictions.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## What It Does
|
| 26 |
|
| 27 |
+
- Detects handwritten text regions on scanned Ukrainian manuscript pages.
|
| 28 |
+
- Outputs YOLO object-detection boxes for one class: `handwritten`.
|
| 29 |
+
- Fits the RUKOPYS pipeline as the detector used before crop-level or page-level transcription.
|
| 30 |
+
- Supports reproducible experiments with the curated RUKOPYS MVP YOLO dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Quick Use
|
| 33 |
|
| 34 |
```python
|
| 35 |
+
from huggingface_hub import hf_hub_download
|
| 36 |
+
from ultralytics import YOLO
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
+
model_path = hf_hub_download(
|
| 39 |
+
repo_id="AlexandreSheva/rukopys-yolo11m-detector",
|
| 40 |
+
filename="weights/best.pt",
|
|
|
|
|
|
|
| 41 |
)
|
| 42 |
+
model = YOLO(model_path)
|
| 43 |
+
results = model.predict("page.jpg", imgsz=1536)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
```
|
| 45 |
|
| 46 |
## Limitations
|
| 47 |
+
This model detects regions only; it does not transcribe text. It was trained for RUKOPYS-style Ukrainian manuscript pages, so validate it on other archives, scan qualities, and layouts before reuse. The detector is based on Ultralytics YOLO11 under AGPL-3.0, and the training data inherits CC BY-NC-SA 4.0 terms from the source dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Project Context
|
| 50 |
|