--- language: en license: mit tags: - tabular-classification - healthcare - healthcare-ai - clinical-ai - logistic-regression - explainable-ai - fairness datasets: - mimic-iv metrics: - roc_auc library_name: scikit-learn pipeline_tag: tabular-classification authors: - Isaac Tosin Adisa model-index: - name: Hospital Readmission Logistic Regression results: - task: type: tabular-classification name: Tabular Classification dataset: name: MIMIC-IV type: physionet/mimic-iv metrics: - type: roc_auc value: 0.67 name: ROC AUC --- # 🏥 Hospital Readmission Prediction (Logistic Regression) **Author:** Isaac Tosin Adisa ## 📌 Overview This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It serves as the **linear baseline** in an integrated multi-model comparative framework alongside XGBoost and LightGBM, designed to evaluate the trade-offs between model complexity, predictive performance, calibration quality, explainability, and subgroup fairness. The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. As a logistic regression model, it offers strong interpretability by design — coefficients map directly to feature-level log-odds, making it transparent and auditable without requiring post-hoc explanation tools. This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse. ## 📊 Dataset | Property | Value | |---|---| | Source | MIMIC-IV (v2.2) | | Total admissions | 415,231 | | 30-day readmission prevalence | ~18% | | Feature count | 26 structured clinical features | | Split | Train / Validation / Test (temporal split) | Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history. ## ⚙️ Training | Setting | Value | |---|---| | Framework | scikit-learn | | Solver | lbfgs | | Regularization | L2 (tuned via cross-validation) | | Class imbalance | class_weight="balanced" | | Feature scaling | StandardScaler (applied pre-fit) | | Calibration | Platt scaling (post-hoc) | ## 📈 Performance | Metric | Value | Notes | |---|---|---| | AUC-ROC | ~0.67 | Linear baseline discrimination | > 📊 Logistic Regression serves as the **interpretable linear baseline** in this framework. Its performance provides a lower-bound reference for evaluating the marginal gains of tree-based models (XGBoost, LightGBM) against the cost of reduced transparency. ## 🔍 Explainability Logistic Regression is **inherently interpretable** — no post-hoc explanation method is required. - Feature coefficients directly encode the direction and magnitude of each variable's contribution - Odds ratios can be derived directly from model weights - Compatible with standard clinical audit and regulatory review workflows ## ⚖️ Fairness Evaluation The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source. All subgroups satisfy the following thresholds: | Metric | Threshold | |---|---| | ΔAUC (vs. overall) | ≤ 0.05 | | ΔFNR (vs. overall) | ≤ 0.10 | No subgroup exhibited clinically meaningful performance degradation under these criteria. ## 🚀 Usage ```python import joblib import numpy as np # Load model model = joblib.load("logreg.pkl") # Replace with your 26 clinical features (must be StandardScaler-transformed) X = np.array([[...]]) # Returns 30-day readmission probability pred = model.predict_proba(X)[0][1] print(f"Readmission risk: {pred:.3f}") ``` > ⚠️ Input features must be scaled using the same `StandardScaler` fitted during training before inference. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline. ## 🎯 Intended Use - Linear baseline benchmarking against tree-based models - Clinical ML interpretability research - Demonstration of explainable and fair AI systems - Reproducibility and model comparison ## ⚠️ Limitations - **Linear model** — logistic regression cannot capture non-linear feature interactions present in complex clinical data; tree-based models may outperform it on discrimination metrics. - **Retrospective validation only** — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed. - **Single institution** — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation. - **No causal claims** — feature associations do not imply clinical causation. - **Requires local validation** before any deployment in a clinical decision support context. - **Credentialed dataset** — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data. ## 🔗 Links - 📄 **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535) - 💻 **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction) ## 📜 Citation ```bibtex @misc{adisa2025readmission, title={Hospital Readmission Prediction with Explainability and Fairness}, author={Adisa, Isaac Tosin}, year={2026}, eprint={2604.22535}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` ## License This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).