PellelNitram commited on
Commit
8b09fbc
·
verified ·
1 Parent(s): 2c0482e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - word-detection
6
+ - handwriting
7
+ - onnx
8
+ - xournalpp
9
+ library_name: onnxruntime
10
+ pipeline_tag: object-detection
11
+ ---
12
+
13
+ # WordDetector — Word-Level Bounding Box Detection for Handwritten Text
14
+
15
+ A word-detection model that locates individual handwritten words in document
16
+ images. It produces axis-aligned bounding boxes — no transcription or labels.
17
+ Part of the [Xournal++ HTR](https://github.com/PellelNitram/xournalpp_htr)
18
+ project.
19
+
20
+ ## Model details
21
+
22
+ | Property | Value |
23
+ | --- | --- |
24
+ | Architecture | Modified ResNet-18 encoder + U-Net-style decoder |
25
+ | Input | Grayscale image, resized to 448×448 |
26
+ | Output | 7 feature maps at 224×224 (segmentation + geometry) |
27
+ | Format | ONNX (softmax baked in, opset 17) |
28
+ | Parameters | ~11.2M |
29
+ | Training data | [IAM Handwriting Database](https://huggingface.co/datasets/PellelNitram/xournalpp_htr_IAM_DB) |
30
+ | Best val F1 | 0.88 (lr=0.001, bs=16, 200 epochs) |
31
+ | License | MIT |
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ from xournalpp_htr.inference_models import WordDetectorModel
37
+
38
+ model = WordDetectorModel.from_pretrained()
39
+ boxes = model.detect(grayscale_image) # list[BoundingBox]
40
+ ```
41
+
42
+ Each `BoundingBox` has `x_min`, `y_min`, `x_max`, `y_max` in the original
43
+ image's pixel coordinates.
44
+
45
+ Requires `pip install xournalpp-htr` (pulls `onnxruntime` and
46
+ `huggingface-hub`, no PyTorch needed).
47
+
48
+ ## How it works
49
+
50
+ The model outputs 7 maps per image:
51
+
52
+ - **Segmentation** (3 channels): word / surrounding margin / background
53
+ (softmax classification)
54
+ - **Geometry** (4 channels): per-pixel distance to the top, bottom, left, and
55
+ right edges of the enclosing word bounding box
56
+
57
+ Post-processing decodes these maps into bounding boxes via connected-component
58
+ analysis and DBSCAN clustering.
59
+
60
+ ## Training
61
+
62
+ Trained on the IAM Handwriting Database with an 80/20 random split. The best
63
+ model was selected via a hyperparameter grid search over learning rates
64
+ (0.0005, 0.001, 0.002) and batch sizes (16, 32, 64, 128) with early stopping
65
+ (patience=50).
66
+
67
+ | Hyperparameter | Value |
68
+ | --- | --- |
69
+ | Optimizer | Adam |
70
+ | Learning rate | 0.001 |
71
+ | Batch size | 16 |
72
+ | Max epochs | 200 |
73
+ | Loss | Cross-entropy (segmentation) + IoU (geometry) |
74
+
75
+ Full training instructions:
76
+ [README](https://github.com/PellelNitram/xournalpp_htr/blob/master/xournalpp_htr/training/word_detector/README.md).
77
+
78
+ ## Intended use
79
+
80
+ This model is the detection stage in a handwriting recognition pipeline. It is
81
+ designed to run on personal devices (laptops, edge) via ONNX Runtime — no GPU
82
+ required for inference. A separate transcription model (not yet available)
83
+ would read the detected word regions.
84
+
85
+ ## Limitations
86
+
87
+ - Detection only — no text transcription.
88
+ - Grayscale input required.
89
+ - Fixed 448×448 resize may distort aspect ratio on non-square images.
90
+ - No training-time data augmentation (planned improvement).
91
+ - Validated on IAM-style handwriting; performance on other styles (e.g.
92
+ historical documents) may vary.
93
+
94
+ ## Citation
95
+
96
+ The architecture is based on
97
+ [WordDetectorNN](https://github.com/githubharald/WordDetectorNN) by
98
+ [Harald Scheidl](https://github.com/githubharald).