MAUDE Adverse Event Severity Classifier β€” Bio_ClinicalBERT (Phase 2)

Model Description

Fine-tuned Bio_ClinicalBERT classifier that assigns a severity label β€” Death (D) / Injury (I) / Malfunction (M) / Other (O) β€” to free-text adverse event narratives from the FDA's MAUDE database.

This is Phase 2 of a two-phase project. Phase 1 established a TF-IDF + Logistic Regression baseline; this model improves on that baseline specifically on the highest-stakes class (Death).

Training Data

  • 154,776 MAUDE records (post-cleaning, UNKNOWN labels dropped)
  • Source: openFDA MAUDE API
  • Class distribution is imbalanced (Death prevalence β‰ˆ8%); handled via inverse-frequency class weights in the loss function

Training Procedure

Parameter Value
Base model emilyalsentzer/Bio_ClinicalBERT
Pooling cls_mean_concat ([CLS] + mean-pooled, 1536-dim head)
Max token length 512
Batch size 16 (Γ—2 GPUs, effective 32 with grad accumulation = 2)
Learning rate 2e-5 (AdamW, 10% linear warmup)
Epochs 3
Class weights Inverse-frequency, applied in CrossEntropyLoss
Hardware Kaggle, 2Γ— T4 GPU (DataParallel)
CV strategy 5-fold StratifiedKFold, splits SHA1-fingerprinted for reproducibility
Early stopping Patience = 1 epoch

Evaluation Results (5-fold CV)

Metric Mean Std
F1 weighted 0.8691 0.0010
F1 macro 0.7727 0.0020
F1 β€” Death 0.8318 0.0076
F1 β€” Injury 0.8728 0.0015
F1 β€” Malfunction 0.9069 0.0016
F1 β€” Other 0.4794 0.0069

Improvement over Phase 1 baseline: weighted F1 +0.0207, Death-class F1 +0.0617 (+6.2 pp) β€” the Death-class lift is the primary result this phase targeted.

Per-Fold Breakdown

Fold F1 Death F1 Injury F1 Malfunction F1 Other F1 Macro F1 Weighted
1 0.8346 0.8713 0.9048 0.4730 0.7709 0.8671
2 0.8364 0.8711 0.9092 0.4826 0.7748 0.8700
3 0.8418 0.8749 0.9056 0.4764 0.7747 0.8693
4 0.8249 0.8727 0.9081 0.4735 0.7698 0.8693
5 0.8212 0.8740 0.9065 0.4915 0.7733 0.8698

Deployed Checkpoint

Fold 3 is the checkpoint pushed to this Hub repo, selected because it achieved the highest Death-class F1 (0.8418) across all five folds β€” not the highest weighted or macro F1 (fold 2 leads on both of those). This is a deliberate choice: in this application, a false negative on the Death class (a death narrative misclassified as Injury/Malfunction/Other) is the most clinically consequential error, so fold selection was optimized against that specific risk rather than an aggregate score.

Provenance note: this selection rationale was reconstructed and documented retroactively. At the time of the original push to this Hub repo, no model card or commit message recorded the selection criterion. Future pushes will include this documentation at push time, not after the fact.

Honest Limitations

  • All folds peaked at the epoch-3 training cap β€” validation performance was still improving when training stopped due to patience=1. Extending training would likely add further gains; current numbers are conservative, not a ceiling.
  • Other-class F1 (β‰ˆ0.48) is the weakest class by a wide margin. This class is a small, semantically heterogeneous catch-all bucket rather than a coherent label, which limits how learnable it is regardless of further tuning. Treated as a known limitation, not an active optimization target for this phase.
  • This is a research prototype. It is not validated for, and must not be used for, clinical decision-making.

Links

Author

Mukund Padmanabha β€” LinkedIn

Downloads last month
20
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mukundisb/maude-clinicalbert

Finetuned
(69)
this model

Space using mukundisb/maude-clinicalbert 1