---
language: en
license: mit
tags:
  - tabular-classification
  - healthcare
  - healthcare-ai
  - clinical-ai
  - xgboost
  - explainable-ai
  - fairness
datasets:
  - mimic-iv
metrics:
  - roc_auc
  - brier_score
library_name: xgboost
pipeline_tag: tabular-classification
authors:
  - Isaac Tosin Adisa
model-index:
  - name: Hospital Readmission XGBoost
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: MIMIC-IV
          type: physionet/mimic-iv
        metrics:
          - type: roc_auc
            value: 0.696
            name: ROC AUC
          - type: brier_score
            value: 0.217
            name: Brier Score
---

# 🏥 Hospital Readmission Prediction (XGBoost)

**Author:** Isaac Tosin Adisa · Florida State University

## 📌 Overview

This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework alongside Logistic Regression and LightGBM, designed to address three major barriers to clinical AI deployment: lack of explainability, inadequate fairness evaluation, and absence of production reliability infrastructure.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

## 📊 Dataset

| Property | Value |
|---|---|
| Source | MIMIC-IV (Beth Israel Deaconess Medical Center) |
| Total admissions | 415,231 adult hospital admissions |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 clinically derived features |
| Split | Train / Validation / Test (temporal split) |

Features include demographics, prior utilization, primary diagnosis category, comorbidity burden, medication count, lab value summaries, and length of stay.

> ⚠️ Raw data is not included and requires credentialed access via [PhysioNet](https://physionet.org/content/mimiciv/).

## ⚙️ Training

| Setting | Value |
|---|---|
| Framework | XGBoost |
| Objective | Binary logistic |
| Class imbalance | scale_pos_weight tuned to prevalence |
| Hyperparameter tuning | Optuna (Bayesian search) |
| Calibration | Platt scaling (post-hoc) |

## 📈 Performance

| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | 0.696 (95% CI: 0.691–0.701) | Strong discriminative performance |
| Brier Score | ~0.217 | Calibration reference |
| Benchmark | Comparable to LACE Index (0.60–0.68) | Validated clinical tool |

> 📊 XGBoost delivers the **strongest discrimination** among tree-based models in this framework. LightGBM achieves better calibration (Brier Score: 0.146), making the two complementary depending on the clinical use case.

## 🔍 Explainability

Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models.

- Global feature importance via SHAP summary plots
- Local patient-level force plots for individual predictions
- Compatible with standard clinical decision support workflows

**Top predictors identified by SHAP:**

| Rank | Feature |
|---|---|
| 1 | Prior hospital admissions (12 months) |
| 2 | Medication count |
| 3 | Diagnosis count |
| 4 | Length of stay |
| 5 | Charlson Comorbidity Index |

## ⚖️ Fairness Evaluation

The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by race/ethnicity, age group, sex, and insurance type.

All subgroups satisfy the following thresholds:

| Metric | Threshold | Result |
|---|---|---|
| ΔAUC (vs. overall) | ≤ 0.05 | ✅ Met |
| ΔFNR (vs. overall) | ≤ 0.10 | ✅ Met |

No subgroup exhibited clinically meaningful performance degradation under these criteria. No post-processing bias correction was required.

## 🚀 Usage

```python
import joblib
import numpy as np

# Load model
model = joblib.load("xgboost.pkl")

# Replace with your 26 clinical features
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")
```

> ⚠️ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline.

## 🎯 Intended Use

- Research and reproducibility
- Clinical ML benchmarking
- Demonstration of explainable and fair AI systems
- Benchmarking against validated clinical tools (e.g. LACE Index)

## 🧾 Ethical & Regulatory Considerations

This model is **not a medical device** and is **not approved for clinical use**. Deployment in any clinical setting requires:

- Prospective validation on local patient populations
- Institutional review and governance approval
- Applicable regulatory compliance

This work is aligned with:

- **ONC HTI-1** — AI transparency requirements for health IT
- **HHS Section 1557** — non-discrimination standards in healthcare AI

## 🔁 Reproducibility

All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing.

## ⚠️ Limitations

- **Retrospective validation only** — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- **Single institution** — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- **No causal claims** — feature associations do not imply clinical causation.
- **No real-time EHR integration** — this model operates on static feature vectors; live deployment would require additional infrastructure.
- **Requires local validation** before any deployment in a clinical decision support context.
- **Credentialed dataset** — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

## 🔗 Links

- 📄 **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535)
- 💻 **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction)

## 📜 Citation

```bibtex
@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## License

This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).