File size: 5,874 Bytes
d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f d3c27ff b4e7a3f 79cd4b6 d3c27ff b4e7a3f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | ---
language: en
license: mit
tags:
- tabular-classification
- healthcare
- healthcare-ai
- clinical-ai
- lightgbm
- explainable-ai
- fairness
datasets:
- mimic-iv
metrics:
- roc_auc
- brier_score
library_name: lightgbm
pipeline_tag: tabular-classification
authors:
- Isaac Tosin Adisa
model-index:
- name: Hospital Readmission LightGBM
results:
- task:
type: tabular-classification
name: Tabular Classification
dataset:
name: MIMIC-IV
type: physionet/mimic-iv
metrics:
- type: roc_auc
value: 0.689
name: ROC AUC
- type: brier_score
value: 0.146
name: Brier Score
---
# π₯ Hospital Readmission Prediction (LightGBM)
**Author:** Isaac Tosin Adisa
## π Overview
This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework β including Logistic Regression and XGBoost baselines β designed to balance predictive performance, calibration quality, explainability, and subgroup fairness.
The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. LightGBM was selected as the **best-calibrated model among those evaluated in this framework** (Brier Score: 0.146), making it well-suited for clinical risk stratification where probability estimates matter as much as ranking accuracy.
This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.
## π Dataset
| Property | Value |
|---|---|
| Source | MIMIC-IV (v2.2) |
| Total admissions | 415,231 |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 structured clinical features |
| Split | Train / Validation / Test (temporal split) |
Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.
## βοΈ Training
| Setting | Value |
|---|---|
| Framework | LightGBM 4.x |
| Objective | Binary cross-entropy |
| Class imbalance | Scale-pos-weight tuned to prevalence |
| Hyperparameter tuning | Optuna (Bayesian search) |
| Calibration | Platt scaling (post-hoc) |
## π Performance
| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | ~0.689 | Discrimination performance |
| Brier Score | **0.146** | Best calibrated in the framework |
> β
LightGBM achieves the **best calibration among all models evaluated in this framework**. Well-calibrated probabilities are critical in clinical settings where risk thresholds drive care decisions.
## π Explainability
Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models.
- Global feature importance via SHAP summary plots
- Local patient-level force plots for individual predictions
- Compatible with standard clinical decision support workflows
## βοΈ Fairness Evaluation
The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.
All subgroups satisfy the following thresholds:
| Metric | Threshold |
|---|---|
| ΞAUC (vs. overall) | β€ 0.05 |
| ΞFNR (vs. overall) | β€ 0.10 |
No subgroup exhibited clinically meaningful performance degradation under these criteria.
## π Usage
```python
import joblib
import numpy as np
# Load model
model = joblib.load("lightgbm.pkl")
# Replace with your 26 clinical features
X = np.array([[...]])
# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]
print(f"Readmission risk: {pred:.3f}")
```
> β οΈ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline.
## π― Intended Use
- Research and reproducibility
- Clinical ML benchmarking
- Demonstration of explainable and fair AI systems
## π Reproducibility
All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing.
## β οΈ Limitations
- **Retrospective validation only** β model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- **Single institution** β MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- **No causal claims** β feature associations do not imply clinical causation.
- **Requires local validation** before any deployment in a clinical decision support context.
- **Credentialed dataset** β MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.
## π Links
- π **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535)
- π» **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction)
## π Citation
```bibtex
@misc{adisa2025readmission,
title={Hospital Readmission Prediction with Explainability and Fairness},
author={Adisa, Isaac Tosin},
year={2026},
eprint={2604.22535},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
## License
This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/). |