Update README.md

fc5fc64 verified 6 months ago

2.86 kB

license: mit
datasets:
  - ai-forever/school_notebooks_RU
  - ai-forever/school_notebooks_EN
language:
  - ru
metrics:
  - f1
pipeline_tag: image-segmentation
model-index:
  - name: hwr_text_detection_rus
    results:
      - task:
          type: image-segmentation
        dataset:
          name: >-
            ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN
            (validation mix)
          type: custom
          split: validation
        metrics:
          - name: F1
            type: f1
            value: 0.72

hwr_text_detection_rus

Handwritten text detection model for Russian notebook images.

This model is intended to find text regions (words / short text fragments) in handwritten notebook images so you can crop them and pass the crops to an OCR model (e.g. kotmayyaka/hwr_text_ocr_rus).

It is not a full OCR pipeline by itself.

What’s inside

Checkpoint: hwr_text_detection_rus.pth
Inference helper code (placeholders for now):
- hwr_detection.py — detector wrapper class (load + preprocess + postprocess)
- inference_detection.py — CLI example

Intended use

✅ Detect text regions on notebook photos/scans
✅ Preprocessing step before word-level OCR
❌ Does not output recognized text (only regions)
❌ Not guaranteed to generalize to very different handwriting styles, paper types, camera angles, or lighting conditions

Quickstart (inference)

1) Install dependencies

pip install torch torchvision pillow opencv-python

2) Run CLI inference

python inference_detection.py \
  --image /path/to/page_or_crop.jpg \
  --checkpoint hwr_text_detection_rus.pth \
  --out detections.json

3) Use from Python

from PIL import Image
from hwr_detection import HWRTextDetector

detector = HWRTextDetector(
    checkpoint_path="hwr_text_detection_rus.pth",
    device="cpu",
    score_thresh=0.1,
    config="COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml",
    num_classes=1
)

img_path = "sample.jpg"
img = Image.open(img_path).convert("RGB")

preds = detector.predict_polygons(img)
detector.save_custom_json(preds, img_path, "preds.json")

Input recommendations

Use reasonably high-resolution images (text should be readable).
Avoid extreme rotation/perspective; if present, consider deskewing.
For best OCR later, crop detected boxes tightly (optionally expand slightly to include ascenders/descenders).

Output

The model outputs text region detections (e.g. bounding boxes with confidence scores).

You can then crop the regions and send them to an OCR model.

Evaluation

Metrics reported in the model card header were obtained on an internal mixed validation split based on:

ai-forever/school_notebooks_RU
ai-forever/school_notebooks_EN

kotmayyaka
/

hwr_text_detection_rus