AlexandreSheva
/

rukopys-yolo11m-detector

@@ -18,103 +18,33 @@ tags:
 - image-to-text
 ---
-# RUKOPYS Qwen3-VL 8B Page LoRA
-Initial public page-level LoRA adapter for Ukrainian handwritten document parsing. It adapts Qwen3-VL 8B to read a full scanned page and return structured text regions.
-`AlexandreSheva/rukopys-yolo11m-detector` contains a PEFT/LoRA adapter, not a standalone model. Load it on top of
-[`Qwen/Qwen3-VL-8B-Instruct`](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) to run page-level Ukrainian handwriting
-recognition and document-structure extraction.
 ## What It Does
-- Takes a full-page manuscript or handwriting image as input.
-- Produces structured JSON regions with bounding boxes, region types, language metadata, and text.
-- Targets Ukrainian handwritten text recognition (HTR), OCR post-processing, and document AI
-  workflows.
-- Fits into the RUKOPYS pipeline as the page-level vision-language model.
-## Release Positioning
-Baseline public 8B page adapter for the RUKOPYS HTR pipeline.
-The adapter is intended for experimentation, portfolio review, and reproducible HTR pipeline
-development. For production use, validate on your own scans because handwriting style, scan quality,
-page layout, and annotation source can shift model behavior.
-## Training Data
-Trained on the curated RUKOPYS MVP dataset:
-[`your-hf-username-or-org/rukopys-curated-mvp`](https://huggingface.co/datasets/your-hf-username-or-org/rukopys-curated-mvp).
-The dataset is a cleaned derivative of `UkrainianCatholicUniversity/rukopys` prepared for:
-- page-to-regions JSON supervised fine-tuning,
-- crop-level text transcription fine-tuning,
-- layout detection experiments,
-- repeatable Kaggle-style evaluation and submission generation.
-## Training Setup
-- Base model: `Qwen/Qwen3-VL-8B-Instruct`
-- Method: 4-bit QLoRA / PEFT LoRA adapter fine-tuning
-- LoRA rank: `not recorded`
-- LoRA alpha: `not recorded`
-- Max steps: `not recorded`
-- Learning rate: `not recorded`
-- Per-device batch size: `not recorded`
-- Gradient accumulation steps: `not recorded`
-- Effective batch size: `not recorded`
-- Max sequence length: `not recorded`
-- Max image pixels: `not recorded`
-- Minimum quality weight: `not recorded`
-- Weighted sampling: `not recorded`
-- Training examples used: `not recorded`
-- Evaluation examples held out: `not recorded`
 ## Quick Use
 ```python
-from peft import PeftModel
-from transformers import AutoModelForImageTextToText, AutoProcessor
-base_model_id = "Qwen/Qwen3-VL-8B-Instruct"
-adapter_id = "AlexandreSheva/rukopys-yolo11m-detector"
-processor = AutoProcessor.from_pretrained(adapter_id)
-base_model = AutoModelForImageTextToText.from_pretrained(
-    base_model_id,
-    device_map="auto",
-    torch_dtype="auto",
 )
-model = PeftModel.from_pretrained(base_model, adapter_id)
-model.eval()
-```
-Use the project inference CLI for end-to-end page prediction and Kaggle submission generation.
-## Output Format
-The expected assistant response is JSON compatible with the RUKOPYS page schema:
-```json
-[
-  {
-    "bbox": [10, 20, 300, 80],
-    "type": "handwritten",
-    "language": "uk",
-    "text": "..."
-  }
-]
 ```
 ## Limitations
-- The adapter was trained for Ukrainian handwriting and may not generalize to other languages.
-- It is sensitive to page resolution and preprocessing; match the training pixel budget when
-  possible.
-- Bounding boxes and text should be evaluated together, not as independent OCR text only.
-- The training dataset inherits a non-commercial CC BY-NC-SA 4.0 license from the source data.
 ## Project Context

 - image-to-text
 ---
+# RUKOPYS YOLO 11M Handwriting Region Detector
+`AlexandreSheva/rukopys-yolo11m-detector` contains an Ultralytics YOLO 11M detector trained to localize handwritten regions in RUKOPYS manuscript page images. It is the layout-detection component of the RUKOPYS HTR pipeline and is intended to produce bounding boxes that can be passed to a recognizer or combined with page-level vision-language predictions.
 ## What It Does
+- Detects handwritten text regions on scanned Ukrainian manuscript pages.
+- Outputs YOLO object-detection boxes for one class: `handwritten`.
+- Fits the RUKOPYS pipeline as the detector used before crop-level or page-level transcription.
+- Supports reproducible experiments with the curated RUKOPYS MVP YOLO dataset.
 ## Quick Use
 ```python
+from huggingface_hub import hf_hub_download
+from ultralytics import YOLO
+model_path = hf_hub_download(
+    repo_id="AlexandreSheva/rukopys-yolo11m-detector",
+    filename="weights/best.pt",
 )
+model = YOLO(model_path)
+results = model.predict("page.jpg", imgsz=1536)
 ```
 ## Limitations
+This model detects regions only; it does not transcribe text. It was trained for RUKOPYS-style Ukrainian manuscript pages, so validate it on other archives, scan qualities, and layouts before reuse. The detector is based on Ultralytics YOLO11 under AGPL-3.0, and the training data inherits CC BY-NC-SA 4.0 terms from the source dataset.
 ## Project Context