---
license: mit
datasets:
  - ai-forever/school_notebooks_RU
  - ai-forever/school_notebooks_EN
language:
  - ru
metrics:
  - f1
pipeline_tag: image-segmentation

model-index:
  - name: hwr_text_detection_rus
    results:
      - task:
          type: image-segmentation
        dataset:
          name: >-
            ai-forever/school_notebooks_RU + ai-forever/school_notebooks_EN
            (validation mix)
          type: custom
          split: validation
        metrics:
          - name: F1
            type: f1
            value: 0.72
---

# hwr_text_detection_rus

Handwritten text **detection** model for **Russian** notebook images.

This model is intended to **find text regions** (words / short text fragments) in handwritten notebook images so you can crop them and pass the crops to an OCR model (e.g. `kotmayyaka/hwr_text_ocr_rus`).  

It is **not** a full OCR pipeline by itself.

## What’s inside

- Checkpoint: `hwr_text_detection_rus.pth`
- Inference helper code (placeholders for now):
  - `hwr_detection.py` — detector wrapper class (load + preprocess + postprocess)
  - `inference_detection.py` — CLI example

## Intended use

- ✅ Detect text regions on notebook photos/scans
- ✅ Preprocessing step before word-level OCR
- ❌ Does not output recognized text (only regions)
- ❌ Not guaranteed to generalize to very different handwriting styles, paper types, camera angles, or lighting conditions

## Quickstart (inference)

### 1) Install dependencies

```bash
pip install torch torchvision pillow opencv-python
```

### 2) Run CLI inference
```python
python inference_detection.py \
  --image /path/to/page_or_crop.jpg \
  --checkpoint hwr_text_detection_rus.pth \
  --out detections.json
```

### 3) Use from Python
```python
from PIL import Image
from hwr_detection import HWRTextDetector

detector = HWRTextDetector(
    checkpoint_path="hwr_text_detection_rus.pth",
    device="cpu",
    score_thresh=0.1,
    config="COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x.yaml",
    num_classes=1
)

img_path = "sample.jpg"
img = Image.open(img_path).convert("RGB")

preds = detector.predict_polygons(img)
detector.save_custom_json(preds, img_path, "preds.json")
```

### Input recommendations
* Use reasonably high-resolution images (text should be readable).
* Avoid extreme rotation/perspective; if present, consider deskewing.
* For best OCR later, crop detected boxes tightly (optionally expand slightly to include ascenders/descenders).
* 
### Output
* The model outputs text region detections (e.g. bounding boxes with confidence scores).

You can then crop the regions and send them to an OCR model.

### Evaluation
Metrics reported in the model card header were obtained on an internal mixed validation split based on:

* ai-forever/school_notebooks_RU
* ai-forever/school_notebooks_EN

### License
* MIT