jibmaird
/

NCAS-hospital-indication-classifier

+---
+license: apache-2.0
+language:
+- en
+tags:
+- clinical-nlp
+- antimicrobial-stewardship
+- bert
+- multilabel-classification
+- hospital
+- medical
+pipeline_tag: text-classification
+library_name: pytorch
+base_model: emilyalsentzer/Bio_ClinicalBERT
+---
+# NCAS Hospital Indication Classifier
+A **BioClinicalBERT**-based multilabel classifier for categorising antimicrobial prescription
+indication text from hospital electronic medical records (EMR).  Developed as part of a research
+project at RMIT University / The Royal Melbourne Hospital (RMH) investigating automated
+antimicrobial stewardship support.
+## Model description
+| Attribute | Value |
+|-----------|-------|
+| Base encoder | [emilyalsentzer/Bio_ClinicalBERT](https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT) |
+| Pooling | Mean pooling over token embeddings |
+| Classification head | Linear + Sigmoid |
+| Task | Multilabel classification (8 categories) |
+| Training data | ~2,000 manually annotated hospital prescription records (RMH 2021) |
+| Held-out evaluation | 600 records from RMH 2022, 2023, 2024 |
+## Label schema (8catb)
+| Label | Description |
+|-------|-------------|
+| `respiratory - ioi` | Respiratory infection of indication |
+| `skin and soft tissue - ioi` | Skin/soft-tissue infection of indication |
+| `urinary tract - ioi` | Urinary tract infection of indication |
+| `other` | Other or unspecified indication |
+| `sepsis` | Sepsis or bacteraemia |
+| `undifferentiated infection` | Infection without identified source |
+| `organism only` | Organism identified but no clinical syndrome specified |
+| `no indication documented` | No clinical indication present in the text |
+A sample can receive one or more labels simultaneously (multilabel).
+## Post-processing rule
+After model prediction, `sepsis` is suppressed from any sample that also receives
+`respiratory - ioi` OR `skin and soft tissue - ioi`.  If suppression would leave zero
+labels, the removal is reverted (fallback guarantee).
+## Usage
+### Quick start
+```python
+from huggingface_hub import hf_hub_download
+from ncas_indication.model import ClinicalBERTClassifier
+from transformers import AutoTokenizer
+# Download checkpoint
+model_path = hf_hub_download(
+    repo_id="jibmaird/NCAS-hospital-indication-classifier",
+    filename="indication_classifier_model.pt",
+)
+# Load model (label names and thresholds are embedded in the checkpoint)
+model, label_columns, thresholds = ClinicalBERTClassifier.from_checkpoint(model_path)
+tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
+```
+Or using the inference script from the [GitHub repository](https://github.com/jibmaird/NCAS-hospital-indication-classifier):
+```bash
+# Single text
+python inference/predict.py --text "UTI prophylaxis post-renal transplant"
+# CSV file
+python inference/predict.py --input your_file.csv --output predictions.csv
+```
+### Desktop application
+A cross-platform desktop GUI is available in the `app/` folder of the repository.
+See [app/README.md](https://github.com/jibmaird/NCAS-hospital-indication-classifier/blob/main/app/README.md).
+## Training
+### Hyperparameters
+| Parameter | Value |
+|-----------|-------|
+| Learning rate | 1e-5 |
+| Batch size | 8 |
+| Epochs | 20 |
+| Optimizer | AdamW |
+| Loss function | Weighted BCE (inverse-frequency weights) |
+| Validation split | 20% of training data |
+| Threshold selection | Per-label F1 maximisation on validation set |
+### Training procedure
+1. The combined dataset of ~2,000 labelled records was split 80/20 for training and validation.
+2. Inverse-frequency class weights were applied to the BCE loss to address label imbalance.
+3. Per-label decision thresholds were optimised on the validation set by grid search over
+   [0.1, 0.2, …, 0.8] to maximise label-specific F1.
+4. The model with the best weighted-macro F1 across epochs was retained.
+## Checkpoint format
+The `.pt` file is a standard PyTorch checkpoint dict with keys:
+```python
+{
+    "model_state_dict":   ...,   # nn.Module weights
+    "label_columns":      [...], # ordered label names
+    "optimal_thresholds": [...], # per-label decision thresholds
+    "n_labels":           8,
+    "base_model":         "emilyalsentzer/Bio_ClinicalBERT",
+}
+```
+## Limitations and intended use
+- The model was trained and evaluated on de-identified records from a single Australian
+  tertiary hospital (RMH).  Performance may differ on records from other hospitals,
+  health systems, or clinical workflows.
+- This model is intended for **research purposes** and is not a validated clinical decision
+  support tool.  Clinical decisions must remain with qualified healthcare professionals.
+- The training data cannot be shared due to privacy restrictions; the annotation schema
+  and data format are documented in the companion GitHub repository.
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@article{ncas_indication_classifier_2025,
+  title   = {Automated Classification of Antimicrobial Prescription Indications
+             Using BioClinicalBERT},
+  author  = {...},
+  journal = {...},
+  year    = {2025},
+  note    = {Under review}
+}
+```
+## Repository
+Source code, training scripts, and the desktop application are available at:
+[https://github.com/jibmaird/NCAS-hospital-indication-classifier](https://github.com/jibmaird/NCAS-hospital-indication-classifier)
+## License
+Apache 2.0 — see [LICENSE](https://github.com/jibmaird/NCAS-hospital-indication-classifier/blob/main/LICENSE).