---
language: en
license: mit
tags:
  - tabular-classification
  - healthcare
  - healthcare-ai
  - clinical-ai
  - logistic-regression
  - explainable-ai
  - fairness
datasets:
  - mimic-iv
metrics:
  - roc_auc
library_name: scikit-learn
pipeline_tag: tabular-classification
authors:
  - Isaac Tosin Adisa
model-index:
  - name: Hospital Readmission Logistic Regression
    results:
      - task:
          type: tabular-classification
          name: Tabular Classification
        dataset:
          name: MIMIC-IV
          type: physionet/mimic-iv
        metrics:
          - type: roc_auc
            value: 0.67
            name: ROC AUC
---

# 🏥 Hospital Readmission Prediction (Logistic Regression)

**Author:** Isaac Tosin Adisa

## 📌 Overview

This model predicts **30-day hospital readmission risk** using structured clinical features derived from the MIMIC-IV dataset. It serves as the **linear baseline** in an integrated multi-model comparative framework alongside XGBoost and LightGBM, designed to evaluate the trade-offs between model complexity, predictive performance, calibration quality, explainability, and subgroup fairness.

The model outputs calibrated probabilities suitable for downstream clinical risk stratification workflows. As a logistic regression model, it offers strong interpretability by design — coefficients map directly to feature-level log-odds, making it transparent and auditable without requiring post-hoc explanation tools.

This model is released alongside a fully reproducible pipeline and open-source implementation to facilitate independent validation and reuse.

## 📊 Dataset

| Property | Value |
|---|---|
| Source | MIMIC-IV (v2.2) |
| Total admissions | 415,231 |
| 30-day readmission prevalence | ~18% |
| Feature count | 26 structured clinical features |
| Split | Train / Validation / Test (temporal split) |

Features include demographics, admission type, primary diagnosis category, comorbidity burden (Elixhauser), length of stay, lab value summaries, procedure counts, and prior utilization history.

## ⚙️ Training

| Setting | Value |
|---|---|
| Framework | scikit-learn |
| Solver | lbfgs |
| Regularization | L2 (tuned via cross-validation) |
| Class imbalance | class_weight="balanced" |
| Feature scaling | StandardScaler (applied pre-fit) |
| Calibration | Platt scaling (post-hoc) |

## 📈 Performance

| Metric | Value | Notes |
|---|---|---|
| AUC-ROC | ~0.67 | Linear baseline discrimination |

> 📊 Logistic Regression serves as the **interpretable linear baseline** in this framework. Its performance provides a lower-bound reference for evaluating the marginal gains of tree-based models (XGBoost, LightGBM) against the cost of reduced transparency.

## 🔍 Explainability

Logistic Regression is **inherently interpretable** — no post-hoc explanation method is required.

- Feature coefficients directly encode the direction and magnitude of each variable's contribution
- Odds ratios can be derived directly from model weights
- Compatible with standard clinical audit and regulatory review workflows

## ⚖️ Fairness Evaluation

The model was evaluated across **16 demographic and clinical subgroups**, including stratifications by age group, sex, race/ethnicity, insurance type, and admission source.

All subgroups satisfy the following thresholds:

| Metric | Threshold |
|---|---|
| ΔAUC (vs. overall) | ≤ 0.05 |
| ΔFNR (vs. overall) | ≤ 0.10 |

No subgroup exhibited clinically meaningful performance degradation under these criteria.

## 🚀 Usage

```python
import joblib
import numpy as np

# Load model
model = joblib.load("logreg.pkl")

# Replace with your 26 clinical features (must be StandardScaler-transformed)
X = np.array([[...]])

# Returns 30-day readmission probability
pred = model.predict_proba(X)[0][1]

print(f"Readmission risk: {pred:.3f}")
```

> ⚠️ Input features must be scaled using the same `StandardScaler` fitted during training before inference. See the [repository](https://github.com/Tomisin92/readmission-prediction) for the full feature schema and preprocessing pipeline.

## 🎯 Intended Use

- Linear baseline benchmarking against tree-based models
- Clinical ML interpretability research
- Demonstration of explainable and fair AI systems
- Reproducibility and model comparison

## ⚠️ Limitations

- **Linear model** — logistic regression cannot capture non-linear feature interactions present in complex clinical data; tree-based models may outperform it on discrimination metrics.
- **Retrospective validation only** — model was trained and evaluated on historical MIMIC-IV data; prospective validation has not been performed.
- **Single institution** — MIMIC-IV reflects one academic medical center (BIDMC); generalizability to other institutions requires local validation.
- **No causal claims** — feature associations do not imply clinical causation.
- **Requires local validation** before any deployment in a clinical decision support context.
- **Credentialed dataset** — MIMIC-IV requires PhysioNet credentialing; this model card does not distribute the underlying data.

## 🔗 Links

- 📄 **Paper:** [arXiv:2604.22535](https://doi.org/10.48550/arXiv.2604.22535)
- 💻 **Code:** [github.com/Tomisin92/readmission-prediction](https://github.com/Tomisin92/readmission-prediction)

## 📜 Citation

```bibtex
@misc{adisa2025readmission,
  title={Hospital Readmission Prediction with Explainability and Fairness},
  author={Adisa, Isaac Tosin},
  year={2026},
  eprint={2604.22535},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
```

## License

This model is released under the [MIT License](https://github.com/Tomisin92/readmission-prediction/blob/main/LICENSE). The underlying MIMIC-IV dataset is subject to its own [PhysioNet credentialed access agreement](https://physionet.org/content/mimiciv/).