--- language: en license: mit tags: - tabular-classification - healthcare - healthcare-ai - clinical-ai - xgboost - explainable-ai - fairness datasets: - mimic-iv metrics: - roc_auc - brier_score library_name: xgboost pipeline_tag: tabular-classification authors: - Isaac Tosin Adisa model-index: - name: Hospital Readmission XGBoost results: - task: type: tabular-classification name: Tabular Classification dataset: name: MIMIC-IV type: physionet/mimic-iv metrics: - type: roc_auc value: 0.696 name: ROC AUC - type: brier_score value: 0.217 name: Brier Score --- # ๐Ÿฅ Hospital Readmission Prediction (XGBoost) **Author:** Isaac Tosin Adisa ยท Florida State University ## ๐Ÿ“Œ Overview This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It is part of an integrated multi-model comparative framework alongside Logistic Regression and LightGBM, designed to address three major barriers to clinical AI deployment: lack of explainability, inadequate fairness evaluation, and absence of production reliability infrastructure. The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse. ## ๐Ÿ“Š Dataset | Property | Value | |---|---| | Source | MIMIC-IV (Beth Israel Deaconess Medical Center) | | Total admissions | 415,231 adult hospital admissions | | 30-day readmission prevalence | ~18% | | Feature count | 26 clinically derived features | | Split | Train / Validation / Test (temporal split) | Features include demographics, prior utilization, primary diagnosis category, comorbidity burden, medication count, lab value summaries, and length of stay. > โš ๏ธ Raw data is not included and requires credentialed access via [PhysioNet](https://physionet.org/content/mimiciv/). ## โš™๏ธ Training | Setting | Value | |---|---| | Framework | XGBoost | | Objective | Binary logistic | | Class imbalance | scale_pos_weight tuned to prevalence | | Hyperparameter tuning | Optuna (Bayesian search) | | Calibration | Platt scaling (post-hoc) | ## ๐Ÿ“ˆ Performance | Metric | Value | Notes | |---|---|---| | AUC-ROC | 0.696 (95% CI: 0.691โ€“0.701) | Strong discriminative performance | | Brier Score | ~0.217 | Calibration reference | | Benchmark | Comparable to LACE Index (0.60โ€“0.68) | Validated clinical tool | > ๐Ÿ“Š XGBoost delivers the **strongest discrimination** among tree-based models in this framework. LightGBM achieves better calibration (Brier Score: 0.146), making the two complementary depending on the clinical use case. ## ๐Ÿ” Explainability Per-patient explanations are generated using **SHAP TreeExplainer**, which is exact and computationally efficient for tree-based models. - Global feature importance via SHAP summary plots - Local patient-level force plots for individual predictions - Compatible with standard clinical decision support workflows **Top predictors identified by SHAP:** | Rank | Feature | |---|---| | 1 | Prior hospital admissions (12 months) | | 2 | Medication count | | 3 | Diagnosis count | | 4 | Length of stay | | 5 | Charlson Comorbidity Index | ## โš–๏ธ Fairness Evaluation The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by race/ethnicity, age group, sex, and insurance type. All subgroups satisfy the following thresholds: | Metric | Threshold | Result | |---|---|---| | ฮ”AUC (vs. overall) | โ‰ค 0.05 | โœ… Met | | ฮ”FNR (vs. overall) | โ‰ค 0.10 | โœ… Met | No subgroup exhibited clinically meaningful performance degradation under these criteria. No post-processing bias correction was required. ## ๐Ÿš€ Usage ```python import joblib import numpy as np # Load model model = joblib.load("xgboost.pkl") # Replace with your 26 clinical features X = np.array([[...]]) # Returns 30-day readmission probability pred = model.predict_proba(X)[0][1] print(f"Readmission risk: {pred:.3f}") ``` > โš ๏ธ Input features must match the 26 clinical variables used during training. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline. ## ๐ŸŽฏ Intended Use - Research and reproducibility - Clinical ML benchmarking - Demonstration of explainable and fair AI systems - Benchmarking against validated clinical tools (e.g. LACE Index) ## ๐Ÿงพ Ethical & Regulatory Considerations This model is **not a medical device** and is **not approved for clinical use**. Deployment in any clinical setting requires: - Prospective validation on local patient populations - Institutional review and governance approval - Applicable regulatory compliance This work is aligned with: - **ONC HTI-1** โ€” AI transparency requirements for health IT - **HHS Section 1557** โ€” non-discrimination standards in healthcare AI ## ๐Ÿ” Reproducibility All results are fully reproducible using the open-source pipeline at [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction), which includes data preprocessing, feature engineering, model training, SHAP explainability, and fairness auditing. ## โš ๏ธ Limitations - **Retrospective validation only** โ€” model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed. - **Single institution** โ€” MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation. - **No causal claims** โ€” feature associations do not imply clinical causation. - **No real-time EHR integration** โ€” this model operates on static feature vectors; live deployment would require additional infrastructure. - **Requires local validation** before any deployment in a clinical decision support context. - **Credentialed dataset** โ€” MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data. ## ๐Ÿ”— Links - ๐Ÿ“„ **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535) - ๐Ÿ’ป **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction) ## ๐Ÿ“œ Citation ```bibtex @misc{adisa2025readmission, title={Hospital Readmission Prediction with Explainability and Fairness}, author={Adisa, Isaac Tosin}, year={2026}, eprint={2604.22535}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ## License This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).