AlexandreSheva commited on
Commit
cb0b15b
·
verified ·
1 Parent(s): 055a781

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -84
README.md CHANGED
@@ -18,103 +18,33 @@ tags:
18
  - image-to-text
19
  ---
20
 
21
- # RUKOPYS Qwen3-VL 8B Page LoRA
22
 
23
- Initial public page-level LoRA adapter for Ukrainian handwritten document parsing. It adapts Qwen3-VL 8B to read a full scanned page and return structured text regions.
24
-
25
- `AlexandreSheva/rukopys-yolo11m-detector` contains a PEFT/LoRA adapter, not a standalone model. Load it on top of
26
- [`Qwen/Qwen3-VL-8B-Instruct`](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) to run page-level Ukrainian handwriting
27
- recognition and document-structure extraction.
28
 
29
  ## What It Does
30
 
31
- - Takes a full-page manuscript or handwriting image as input.
32
- - Produces structured JSON regions with bounding boxes, region types, language metadata, and text.
33
- - Targets Ukrainian handwritten text recognition (HTR), OCR post-processing, and document AI
34
- workflows.
35
- - Fits into the RUKOPYS pipeline as the page-level vision-language model.
36
-
37
- ## Release Positioning
38
-
39
- Baseline public 8B page adapter for the RUKOPYS HTR pipeline.
40
-
41
- The adapter is intended for experimentation, portfolio review, and reproducible HTR pipeline
42
- development. For production use, validate on your own scans because handwriting style, scan quality,
43
- page layout, and annotation source can shift model behavior.
44
-
45
- ## Training Data
46
-
47
- Trained on the curated RUKOPYS MVP dataset:
48
- [`your-hf-username-or-org/rukopys-curated-mvp`](https://huggingface.co/datasets/your-hf-username-or-org/rukopys-curated-mvp).
49
-
50
- The dataset is a cleaned derivative of `UkrainianCatholicUniversity/rukopys` prepared for:
51
-
52
- - page-to-regions JSON supervised fine-tuning,
53
- - crop-level text transcription fine-tuning,
54
- - layout detection experiments,
55
- - repeatable Kaggle-style evaluation and submission generation.
56
-
57
- ## Training Setup
58
-
59
- - Base model: `Qwen/Qwen3-VL-8B-Instruct`
60
- - Method: 4-bit QLoRA / PEFT LoRA adapter fine-tuning
61
- - LoRA rank: `not recorded`
62
- - LoRA alpha: `not recorded`
63
- - Max steps: `not recorded`
64
- - Learning rate: `not recorded`
65
- - Per-device batch size: `not recorded`
66
- - Gradient accumulation steps: `not recorded`
67
- - Effective batch size: `not recorded`
68
- - Max sequence length: `not recorded`
69
- - Max image pixels: `not recorded`
70
- - Minimum quality weight: `not recorded`
71
- - Weighted sampling: `not recorded`
72
- - Training examples used: `not recorded`
73
- - Evaluation examples held out: `not recorded`
74
 
75
  ## Quick Use
76
 
77
  ```python
78
- from peft import PeftModel
79
- from transformers import AutoModelForImageTextToText, AutoProcessor
80
-
81
- base_model_id = "Qwen/Qwen3-VL-8B-Instruct"
82
- adapter_id = "AlexandreSheva/rukopys-yolo11m-detector"
83
 
84
- processor = AutoProcessor.from_pretrained(adapter_id)
85
- base_model = AutoModelForImageTextToText.from_pretrained(
86
- base_model_id,
87
- device_map="auto",
88
- torch_dtype="auto",
89
  )
90
- model = PeftModel.from_pretrained(base_model, adapter_id)
91
- model.eval()
92
- ```
93
-
94
- Use the project inference CLI for end-to-end page prediction and Kaggle submission generation.
95
-
96
- ## Output Format
97
-
98
- The expected assistant response is JSON compatible with the RUKOPYS page schema:
99
-
100
- ```json
101
- [
102
- {
103
- "bbox": [10, 20, 300, 80],
104
- "type": "handwritten",
105
- "language": "uk",
106
- "text": "..."
107
- }
108
- ]
109
  ```
110
 
111
  ## Limitations
112
-
113
- - The adapter was trained for Ukrainian handwriting and may not generalize to other languages.
114
- - It is sensitive to page resolution and preprocessing; match the training pixel budget when
115
- possible.
116
- - Bounding boxes and text should be evaluated together, not as independent OCR text only.
117
- - The training dataset inherits a non-commercial CC BY-NC-SA 4.0 license from the source data.
118
 
119
  ## Project Context
120
 
 
18
  - image-to-text
19
  ---
20
 
21
+ # RUKOPYS YOLO 11M Handwriting Region Detector
22
 
23
+ `AlexandreSheva/rukopys-yolo11m-detector` contains an Ultralytics YOLO 11M detector trained to localize handwritten regions in RUKOPYS manuscript page images. It is the layout-detection component of the RUKOPYS HTR pipeline and is intended to produce bounding boxes that can be passed to a recognizer or combined with page-level vision-language predictions.
 
 
 
 
24
 
25
  ## What It Does
26
 
27
+ - Detects handwritten text regions on scanned Ukrainian manuscript pages.
28
+ - Outputs YOLO object-detection boxes for one class: `handwritten`.
29
+ - Fits the RUKOPYS pipeline as the detector used before crop-level or page-level transcription.
30
+ - Supports reproducible experiments with the curated RUKOPYS MVP YOLO dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
  ## Quick Use
33
 
34
  ```python
35
+ from huggingface_hub import hf_hub_download
36
+ from ultralytics import YOLO
 
 
 
37
 
38
+ model_path = hf_hub_download(
39
+ repo_id="AlexandreSheva/rukopys-yolo11m-detector",
40
+ filename="weights/best.pt",
 
 
41
  )
42
+ model = YOLO(model_path)
43
+ results = model.predict("page.jpg", imgsz=1536)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ```
45
 
46
  ## Limitations
47
+ This model detects regions only; it does not transcribe text. It was trained for RUKOPYS-style Ukrainian manuscript pages, so validate it on other archives, scan qualities, and layouts before reuse. The detector is based on Ultralytics YOLO11 under AGPL-3.0, and the training data inherits CC BY-NC-SA 4.0 terms from the source dataset.
 
 
 
 
 
48
 
49
  ## Project Context
50