File size: 5,874 Bytes
d3c27ff
 
 
 
 
 
b4e7a3f
d3c27ff
 
 
 
b4e7a3f
 
d3c27ff
 
b4e7a3f
 
d3c27ff
b4e7a3f
 
d3c27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4e7a3f
 
d3c27ff
 
b4e7a3f
 
 
d3c27ff
b4e7a3f
d3c27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4e7a3f
d3c27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4e7a3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3c27ff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4e7a3f
d3c27ff
b4e7a3f
79cd4b6
d3c27ff
 
 
 
 
 
 
 
b4e7a3f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
language: en
license: mit
tags:
  - tabular-classification
  - healthcare
  - healthcare-ai
  - clinical-ai
  - lightgbm
  - explainable-ai
  - fairness
datasets:
  - mimic-iv
metrics:
  - roc_auc
  - brier_score
library_name: lightgbm
pipeline_tag: tabular-classification
authors:
  - Isaac Tosin Adisa
model-index:
  - name: Hospital Readmission LightGBM
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: MIMIC-IV
          type: physionet/mimic-iv
        metrics:
          - type: roc_auc
            value: 0.689
            name: ROC AUC
          - type: brier_score
            value: 0.146
            name: Brier Score
---

# πŸ₯ Hospital Readmission Prediction (LightGBM)

**Author:** Isaac Tosin Adisa

## πŸ“Œ Overview

This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework β€” including Logistic Regression and XGBoost baselines β€” designed to balance predictive performance, calibration quality, explainability, and subgroup fairness.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. LightGBM was selected as the **best-calibrated model among those evaluated in this framework** (Brier Score: 0.146), making it well-suited for clinical risk stratification where probability estimates matter as much as ranking accuracy.

This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

## πŸ“Š Dataset

| Property | Value |
|---|---|
| Source | MIMIC-IV (v2.2) |
| Total admissions | 415,231 |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 structured clinical features |
| Split | Train / Validation / Test (temporal split) |

Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.

## βš™οΈ Training

| Setting | Value |
|---|---|
| Framework | LightGBM 4.x |
| Objective | Binary cross-entropy |
| Class imbalance | Scale-pos-weight tuned to prevalence |
| Hyperparameter tuning | Optuna (Bayesian search) |
| Calibration | Platt scaling (post-hoc) |

## πŸ“ˆ Performance

| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | ~0.689 | Discrimination performance |
| Brier Score | **0.146** | Best calibrated in the framework |

> βœ… LightGBM achieves the **best calibration among all models evaluated in this framework**. Well-calibrated probabilities are critical in clinical settings where risk thresholds drive care decisions.

## πŸ” Explainability

Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models.

- Global feature importance via SHAP summary plots
- Local patient-level force plots for individual predictions
- Compatible with standard clinical decision support workflows

## βš–οΈ Fairness Evaluation

The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.

All subgroups satisfy the following thresholds:

| Metric | Threshold |
|---|---|
| Ξ”AUC (vs. overall) | ≀ 0.05 |
| Ξ”FNR (vs. overall) | ≀ 0.10 |

No subgroup exhibited clinically meaningful performance degradation under these criteria.

## πŸš€ Usage

```python
import joblib
import numpy as np

# Load model
model = joblib.load("lightgbm.pkl")

# Replace with your 26 clinical features
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")
```

> ⚠️ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline.

## 🎯 Intended Use

- Research and reproducibility
- Clinical ML benchmarking
- Demonstration of explainable and fair AI systems

## πŸ” Reproducibility

All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing.

## ⚠️ Limitations

- **Retrospective validation only** β€” model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- **Single institution** β€” MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- **No causal claims** β€” feature associations do not imply clinical causation.
- **Requires local validation** before any deployment in a clinical decision support context.
- **Credentialed dataset** β€” MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

## πŸ”— Links

- πŸ“„ **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535)
- πŸ’» **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction)

## πŸ“œ Citation

```bibtex
@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## License

This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).