jibmaird commited on
Commit
736832f
·
verified ·
1 Parent(s): 633f02d

Upload model card

Browse files
Files changed (1) hide show
  1. README.md +159 -0
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - clinical-nlp
7
+ - antimicrobial-stewardship
8
+ - bert
9
+ - multilabel-classification
10
+ - hospital
11
+ - medical
12
+ pipeline_tag: text-classification
13
+ library_name: pytorch
14
+ base_model: emilyalsentzer/Bio_ClinicalBERT
15
+ ---
16
+
17
+ # NCAS Hospital Indication Classifier
18
+
19
+ A **BioClinicalBERT**-based multilabel classifier for categorising antimicrobial prescription
20
+ indication text from hospital electronic medical records (EMR). Developed as part of a research
21
+ project at RMIT University / The Royal Melbourne Hospital (RMH) investigating automated
22
+ antimicrobial stewardship support.
23
+
24
+ ## Model description
25
+
26
+ | Attribute | Value |
27
+ |-----------|-------|
28
+ | Base encoder | [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) |
29
+ | Pooling | Mean pooling over token embeddings |
30
+ | Classification head | Linear + Sigmoid |
31
+ | Task | Multilabel classification (8 categories) |
32
+ | Training data | ~2,000 manually annotated hospital prescription records (RMH 2021) |
33
+ | Held-out evaluation | 600 records from RMH 2022, 2023, 2024 |
34
+
35
+ ## Label schema (8catb)
36
+
37
+ | Label | Description |
38
+ |-------|-------------|
39
+ | `respiratory - ioi` | Respiratory infection of indication |
40
+ | `skin and soft tissue - ioi` | Skin/soft-tissue infection of indication |
41
+ | `urinary tract - ioi` | Urinary tract infection of indication |
42
+ | `other` | Other or unspecified indication |
43
+ | `sepsis` | Sepsis or bacteraemia |
44
+ | `undifferentiated infection` | Infection without identified source |
45
+ | `organism only` | Organism identified but no clinical syndrome specified |
46
+ | `no indication documented` | No clinical indication present in the text |
47
+
48
+ A sample can receive one or more labels simultaneously (multilabel).
49
+
50
+ ## Post-processing rule
51
+
52
+ After model prediction, `sepsis` is suppressed from any sample that also receives
53
+ `respiratory - ioi` OR `skin and soft tissue - ioi`. If suppression would leave zero
54
+ labels, the removal is reverted (fallback guarantee).
55
+
56
+ ## Usage
57
+
58
+ ### Quick start
59
+
60
+ ```python
61
+ from huggingface_hub import hf_hub_download
62
+ from ncas_indication.model import ClinicalBERTClassifier
63
+ from transformers import AutoTokenizer
64
+
65
+ # Download checkpoint
66
+ model_path = hf_hub_download(
67
+ repo_id="jibmaird/NCAS-hospital-indication-classifier",
68
+ filename="indication_classifier_model.pt",
69
+ )
70
+
71
+ # Load model (label names and thresholds are embedded in the checkpoint)
72
+ model, label_columns, thresholds = ClinicalBERTClassifier.from_checkpoint(model_path)
73
+ tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
74
+ ```
75
+
76
+ Or using the inference script from the [GitHub repository](https://github.com/jibmaird/NCAS-hospital-indication-classifier):
77
+
78
+ ```bash
79
+ # Single text
80
+ python inference/predict.py --text "UTI prophylaxis post-renal transplant"
81
+
82
+ # CSV file
83
+ python inference/predict.py --input your_file.csv --output predictions.csv
84
+ ```
85
+
86
+ ### Desktop application
87
+
88
+ A cross-platform desktop GUI is available in the `app/` folder of the repository.
89
+ See [app/README.md](https://github.com/jibmaird/NCAS-hospital-indication-classifier/blob/main/app/README.md).
90
+
91
+ ## Training
92
+
93
+ ### Hyperparameters
94
+
95
+ | Parameter | Value |
96
+ |-----------|-------|
97
+ | Learning rate | 1e-5 |
98
+ | Batch size | 8 |
99
+ | Epochs | 20 |
100
+ | Optimizer | AdamW |
101
+ | Loss function | Weighted BCE (inverse-frequency weights) |
102
+ | Validation split | 20% of training data |
103
+ | Threshold selection | Per-label F1 maximisation on validation set |
104
+
105
+ ### Training procedure
106
+
107
+ 1. The combined dataset of ~2,000 labelled records was split 80/20 for training and validation.
108
+ 2. Inverse-frequency class weights were applied to the BCE loss to address label imbalance.
109
+ 3. Per-label decision thresholds were optimised on the validation set by grid search over
110
+ [0.1, 0.2, …, 0.8] to maximise label-specific F1.
111
+ 4. The model with the best weighted-macro F1 across epochs was retained.
112
+
113
+ ## Checkpoint format
114
+
115
+ The `.pt` file is a standard PyTorch checkpoint dict with keys:
116
+
117
+ ```python
118
+ {
119
+ "model_state_dict": ..., # nn.Module weights
120
+ "label_columns": [...], # ordered label names
121
+ "optimal_thresholds": [...], # per-label decision thresholds
122
+ "n_labels": 8,
123
+ "base_model": "emilyalsentzer/Bio_ClinicalBERT",
124
+ }
125
+ ```
126
+
127
+ ## Limitations and intended use
128
+
129
+ - The model was trained and evaluated on de-identified records from a single Australian
130
+ tertiary hospital (RMH). Performance may differ on records from other hospitals,
131
+ health systems, or clinical workflows.
132
+ - This model is intended for **research purposes** and is not a validated clinical decision
133
+ support tool. Clinical decisions must remain with qualified healthcare professionals.
134
+ - The training data cannot be shared due to privacy restrictions; the annotation schema
135
+ and data format are documented in the companion GitHub repository.
136
+
137
+ ## Citation
138
+
139
+ If you use this model in your research, please cite:
140
+
141
+ ```bibtex
142
+ @article{ncas_indication_classifier_2025,
143
+ title = {Automated Classification of Antimicrobial Prescription Indications
144
+ Using BioClinicalBERT},
145
+ author = {...},
146
+ journal = {...},
147
+ year = {2025},
148
+ note = {Under review}
149
+ }
150
+ ```
151
+
152
+ ## Repository
153
+
154
+ Source code, training scripts, and the desktop application are available at:
155
+ [https://github.com/jibmaird/NCAS-hospital-indication-classifier](https://github.com/jibmaird/NCAS-hospital-indication-classifier)
156
+
157
+ ## License
158
+
159
+ Apache 2.0 — see [LICENSE](https://github.com/jibmaird/NCAS-hospital-indication-classifier/blob/main/LICENSE).