---
language: en
license: mit
tags:
  - tabular-classification
  - healthcare
  - healthcare-ai
  - clinical-ai
  - lightgbm
  - explainable-ai
  - fairness
datasets:
  - mimic-iv
metrics:
  - roc_auc
  - brier_score
library_name: lightgbm
pipeline_tag: tabular-classification
authors:
  - Isaac Tosin Adisa
model-index:
  - name: Hospital Readmission LightGBM
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: MIMIC-IV
          type: physionet/mimic-iv
        metrics:
          - type: roc_auc
            value: 0.689
            name: ROC AUC
          - type: brier_score
            value: 0.146
            name: Brier Score
---

# 🏥 Hospital Readmission Prediction (LightGBM)

**Author:** Isaac Tosin Adisa

## 📌 Overview

This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework — including Logistic Regression and XGBoost baselines — designed to balance predictive performance, calibration quality, explainability, and subgroup fairness.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. LightGBM was selected as the **best-calibrated model among those evaluated in this framework** (Brier Score: 0.146), making it well-suited for clinical risk stratification where probability estimates matter as much as ranking accuracy.

This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

## 📊 Dataset

| Property | Value |
|---|---|
| Source | MIMIC-IV (v2.2) |
| Total admissions | 415,231 |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 structured clinical features |
| Split | Train / Validation / Test (temporal split) |

Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.

## ⚙️ Training

| Setting | Value |
|---|---|
| Framework | LightGBM 4.x |
| Objective | Binary cross-entropy |
| Class imbalance | Scale-pos-weight tuned to prevalence |
| Hyperparameter tuning | Optuna (Bayesian search) |
| Calibration | Platt scaling (post-hoc) |

## 📈 Performance

| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | ~0.689 | Discrimination performance |
| Brier Score | **0.146** | Best calibrated in the framework |

> ✅ LightGBM achieves the **best calibration among all models evaluated in this framework**. Well-calibrated probabilities are critical in clinical settings where risk thresholds drive care decisions.

## 🔍 Explainability

Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models.

- Global feature importance via SHAP summary plots
- Local patient-level force plots for individual predictions
- Compatible with standard clinical decision support workflows

## ⚖️ Fairness Evaluation

The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.

All subgroups satisfy the following thresholds:

| Metric | Threshold |
|---|---|
| ΔAUC (vs. overall) | ≤ 0.05 |
| ΔFNR (vs. overall) | ≤ 0.10 |

No subgroup exhibited clinically meaningful performance degradation under these criteria.

## 🚀 Usage

```python
import joblib
import numpy as np

# Load model
model = joblib.load("lightgbm.pkl")

# Replace with your 26 clinical features
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")
```

> ⚠️ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline.

## 🎯 Intended Use

- Research and reproducibility
- Clinical ML benchmarking
- Demonstration of explainable and fair AI systems

## 🔁 Reproducibility

All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing.

## ⚠️ Limitations

- **Retrospective validation only** — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- **Single institution** — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- **No causal claims** — feature associations do not imply clinical causation.
- **Requires local validation** before any deployment in a clinical decision support context.
- **Credentialed dataset** — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

## 🔗 Links

- 📄 **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535)
- 💻 **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction)

## 📜 Citation

```bibtex
@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## License

This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).