--- license: mit datasets: - ai-forever/school_notebooks_RU - ai-forever/school_notebooks_EN language: - ru metrics: - f1 pipeline_tag: image-segmentation model-index: - name: hwr_text_detection_rus results: - task: type: image-segmentation dataset: name: >- ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN (validation mix) type: custom split: validation metrics: - name: F1 type: f1 value: 0.72 --- # hwr_text_detection_rus Handwritten text **detection** model for **Russian** notebook images. This model is intended to **find text regions** (words / short text fragments) in handwritten notebook images so you can crop them and pass the crops to an OCR model (e.g. `kotmayyaka/hwr_text_ocr_rus`). It is **not** a full OCR pipeline by itself. ## What’s inside - Checkpoint: `hwr_text_detection_rus.pth` - Inference helper code (placeholders for now): - `hwr_detection.py` — detector wrapper class (load + preprocess + postprocess) - `inference_detection.py` — CLI example ## Intended use - ✅ Detect text regions on notebook photos/scans - ✅ Preprocessing step before word-level OCR - ❌ Does not output recognized text (only regions) - ❌ Not guaranteed to generalize to very different handwriting styles, paper types, camera angles, or lighting conditions ## Quickstart (inference) ### 1) Install dependencies ```bash pip install torch torchvision pillow opencv-python ``` ### 2) Run CLI inference ```python python inference_detection.py \ --image /path/to/page_or_crop.jpg \ --checkpoint hwr_text_detection_rus.pth \ --out detections.json ``` ### 3) Use from Python ```python from PIL import Image from hwr_detection import HWRTextDetector detector = HWRTextDetector( checkpoint_path="hwr_text_detection_rus.pth", device="cpu", score_thresh=0.1, config="COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml", num_classes=1 ) img_path = "sample.jpg" img = Image.open(img_path).convert("RGB") preds = detector.predict_polygons(img) detector.save_custom_json(preds, img_path, "preds.json") ``` ### Input recommendations * Use reasonably high-resolution images (text should be readable). * Avoid extreme rotation/perspective; if present, consider deskewing. * For best OCR later, crop detected boxes tightly (optionally expand slightly to include ascenders/descenders). * ### Output * The model outputs text region detections (e.g. bounding boxes with confidence scores). You can then crop the regions and send them to an OCR model. ### Evaluation Metrics reported in the model card header were obtained on an internal mixed validation split based on: * ai-forever/school_notebooks_RU * ai-forever/school_notebooks_EN ### License * MIT