--- language: en license: mit tags: - tabular-classification - healthcare - healthcare-ai - clinical-ai - lightgbm - explainable-ai - fairness datasets: - mimic-iv metrics: - roc_auc - brier_score library_name: lightgbm pipeline_tag: tabular-classification authors: - Isaac Tosin Adisa model-index: - name: Hospital Readmission LightGBM results: - task: type: tabular-classification name: Tabular Classification dataset: name: MIMIC-IV type: physionet/mimic-iv metrics: - type: roc_auc value: 0.689 name: ROC AUC - type: brier_score value: 0.146 name: Brier Score --- # 🏥 Hospital Readmission Prediction (LightGBM) **Author:** Isaac Tosin Adisa ## 📌 Overview This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework — including Logistic Regression and XGBoost baselines — designed to balance predictive performance, calibration quality, explainability, and subgroup fairness. The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. LightGBM was selected as the **best-calibrated model among those evaluated in this framework** (Brier Score: 0.146), making it well-suited for clinical risk stratification where probability estimates matter as much as ranking accuracy. This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse. ## 📊 Dataset | Property | Value | |---|---| | Source | MIMIC-IV (v2.2) | | Total admissions | 415,231 | | 30-day readmission prevalence | ~18% | | Feature count | 26 structured clinical features | | Split | Train / Validation / Test (temporal split) | Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history. ## ⚙️ Training | Setting | Value | |---|---| | Framework | LightGBM 4.x | | Objective | Binary cross-entropy | | Class imbalance | Scale-pos-weight tuned to prevalence | | Hyperparameter tuning | Optuna (Bayesian search) | | Calibration | Platt scaling (post-hoc) | ## 📈 Performance | Metric | Value | Notes | |---|---|---| | AUC-ROC | ~0.689 | Discrimination performance | | Brier Score | **0.146** | Best calibrated in the framework | > ✅ LightGBM achieves the **best calibration among all models evaluated in this framework**. Well-calibrated probabilities are critical in clinical settings where risk thresholds drive care decisions. ## 🔍 Explainability Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models. - Global feature importance via SHAP summary plots - Local patient-level force plots for individual predictions - Compatible with standard clinical decision support workflows ## ⚖️ Fairness Evaluation The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source. All subgroups satisfy the following thresholds: | Metric | Threshold | |---|---| | ΔAUC (vs. overall) | ≤ 0.05 | | ΔFNR (vs. overall) | ≤ 0.10 | No subgroup exhibited clinically meaningful performance degradation under these criteria. ## 🚀 Usage ```python import joblib import numpy as np # Load model model = joblib.load("lightgbm.pkl") # Replace with your 26 clinical features X = np.array([[...]]) # Returns 30-day readmission probability pred = model.predict_proba(X)[0][1] print(f"Readmission risk: {pred:.3f}") ``` > ⚠️ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline. ## 🎯 Intended Use - Research and reproducibility - Clinical ML benchmarking - Demonstration of explainable and fair AI systems ## 🔁 Reproducibility All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing. ## ⚠️ Limitations - **Retrospective validation only** — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed. - **Single institution** — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation. - **No causal claims** — feature associations do not imply clinical causation. - **Requires local validation** before any deployment in a clinical decision support context. - **Credentialed dataset** — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data. ## 🔗 Links - 📄 **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535) - 💻 **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction) ## 📜 Citation ```bibtex @misc{adisa2025readmission, title={Hospital Readmission Prediction with Explainability and Fairness}, author={Adisa, Isaac Tosin}, year={2026}, eprint={2604.22535}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ## License This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).