--- tags: - healthcare - clinical-ml - diabetes - readmission-prediction - lightgbm - lgbm library_name: lightgbm pipeline_tag: tabular-classification --- # hospital-readmission-phase2-lgbm - Hospital Readmission Risk Prediction ## Model Description This hospital-readmission-phase2-lgbm model predicts the risk of 30-day hospital readmission for diabetic patients. The model was trained on the UCI Diabetes 130-US Hospitals dataset with robust cross-validation and comprehensive evaluation. **Task:** Hospital 30-Day Readmission Risk Prediction **Model Type:** Gradient Boosting Machine (LightGBM) **Training Date:** 2025-12-15 10:55:05 **Environment:** kaggle (GPU) ## Performance Metrics ### Cross-Validation Results (5-Fold CV) | Metric | Value | |--------|-------| | Mean ROC-AUC | 0.8399 ± 0.0055 | ### Final Test Set Results #### Primary Metrics | Metric | Value | |--------|-------| | ROC-AUC | 0.8424 | | PR-AUC | 0.4000 | | F1 Score | 0.1053 | #### Classification Metrics | Metric | Value | |--------|-------| | Precision | 0.6978 | | Recall | 0.0569 | #### Clinical Metrics | Metric | Value | |--------|-------| | Sensitivity (TPR) | 0.0569 | | Specificity (TNR) | 0.9969 | ## Model Visualizations ### ROC Curve ![ROC Curve](./roc_curve.png) ### Precision-Recall Curve ![Precision-Recall Curve](./precision_recall_curve.png) ### Confusion Matrix ![Confusion Matrix](./confusion_matrix.png) ### Calibration Curve ![Calibration Curve](./calibration_curve.png) ### Feature Importance ![Feature Importance](./feature_importance.png) ### Learning Curves ![Learning Curves](./learning_curves.png) ### Validation Curves ![Validation Curves](./validation_curves.png) ### Cross-Fold Metrics Comparison ![Metrics Comparison](./metrics_comparison_across_folds.png) ## Dataset Information | Property | Value | |----------|-------| | Total Samples | 101,766 | | Features | 113 | | Development Set | 86,501 | | Final Test Set | 15,265 | ## Training Configuration ### Evaluation Pipeline - **Final Holdout Split:** Stratified split into development and test sets - **Hyperparameter Search:** Grid search with 5-fold cross-validation - **Nested Early Stopping:** Inner validation split within each fold - **Final Evaluation:** Untouched holdout test set ### Best Hyperparameters ```python { "n_estimators": 150, "learning_rate": 0.05, "num_leaves": 31, "max_depth": -1, "subsample": 0.7, "colsample_bytree": 0.7, "reg_alpha": 0.0, "reg_lambda": 0.1 } ``` ## Training Details - **Total Training Time:** 214.37 minutes - **Hyperparameter Search Time:** 128.14 minutes - **Cross-Validation Folds:** 5 - **Early Stopping:** Yes - **Device:** GPU ## Usage ### Loading the Model ```python import joblib import pandas as pd # Load the trained model model = joblib.load('gradient_boosting_model.joblib') # Load your preprocessed features X_new = pd.read_csv('your_features.csv') # Make predictions predictions = model.predict(X_new) probabilities = model.predict_proba(X_new)[:, 1] ``` ### Feature Requirements The model expects preprocessed features from the UCI Diabetes 130-US Hospitals dataset. Features include: - Patient demographics (age, gender, race) - Admission details (admission type, source, length of stay) - Medical history (number of diagnoses, procedures) - Medication information - Lab results (A1c test results, glucose serum test) - Previous utilization (outpatient, inpatient, emergency visits) See `feature_importance.csv` for complete feature list and importance scores. ## Limitations and Biases - **Domain-Specific:** Model is trained specifically for diabetic patient readmissions - **Dataset Bias:** Training data from 130 US hospitals (1999-2008) may not generalize to all healthcare settings - **Class Imbalance:** Dataset may have imbalanced readmission rates - **Temporal Drift:** Healthcare practices have evolved since data collection - **Geographic Limitation:** US-based dataset may not apply to other healthcare systems ## Ethical Considerations This model is intended to assist healthcare providers in identifying patients at risk of readmission. It should: - **NOT** be used as the sole basis for treatment decisions - Be validated on your specific patient population before deployment - Be monitored for fairness across different demographic groups - Be regularly retrained with recent data to account for changing patterns ## Citation ```bibtex @misc{hospital-readmission-phase2-lgbm, author = {Your Name}, title = {LightGBM Model for Hospital Readmission Prediction}, year = {2025}, url = {https://huggingface.co/your-repo} } ``` ## Dataset Citation ```bibtex @misc{strack2014impact, title={Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records}, author={Strack, Beata and DeShazo, Jonathan P and Gennings, Chris and Olmo, Juan L and Ventura, Sebastian and Cios, Krzysztof J and Clore, John N}, journal={BioMed Research International}, volume={2014}, year={2014}, publisher={Hindawi} } ``` ## License This model is released under the MIT License. The underlying dataset has its own license terms. ## Contact For questions or issues, please open an issue in the repository. --- **Disclaimer:** This model is for research and educational purposes. Always consult healthcare professionals for medical decisions.