Classification of Imbalanced Data Using Machine Learning Algorithms to Predict the Risk of Renal Graft Failures in Ethiopia

Overview

Journal BMC Med Inform Decis Mak

Publisher Biomed Central

Specialty Medical Informatics

Date 2023 May 22

PMID 37217892

Authors

Getahun Mulugeta

Temesgen Zewotir

Awoke Seyoum Tegegne

Leja Hamza Juhar

Mahteme Bekele Muleta

Affiliations

Soon will be listed here.

Abstract

Introduction: The prevalence of end-stage renal disease has raised the need for renal replacement therapy over recent decades. Even though a kidney transplant offers an improved quality of life and lower cost of care than dialysis, graft failure is possible after transplantation. Hence, this study aimed to predict the risk of graft failure among post-transplant recipients in Ethiopia using the selected machine learning prediction models.

Methodology: The data was extracted from the retrospective cohort of kidney transplant recipients at the Ethiopian National Kidney Transplantation Center from September 2015 to February 2022. In response to the imbalanced nature of the data, we performed hyperparameter tuning, probability threshold moving, tree-based ensemble learning, stacking ensemble learning, and probability calibrations to improve the prediction results. Merit-based selected probabilistic (logistic regression, naive Bayes, and artificial neural network) and tree-based ensemble (random forest, bagged tree, and stochastic gradient boosting) models were applied. Model comparison was performed in terms of discrimination and calibration performance. The best-performing model was then used to predict the risk of graft failure.

Results: A total of 278 completed cases were analyzed, with 21 graft failures and 3 events per predictor. Of these, 74.8% are male, and 25.2% are female, with a median age of 37. From the comparison of models at the individual level, the bagged tree and random forest have top and equal discrimination performance (AUC-ROC = 0.84). In contrast, the random forest has the best calibration performance (brier score = 0.045). Under testing the individual model as a meta-learner for stacking ensemble learning, the result of stochastic gradient boosting as a meta-learner has the top discrimination (AUC-ROC = 0.88) and calibration (brier score = 0.048) performance. Regarding feature importance, chronic rejection, blood urea nitrogen, number of post-transplant admissions, phosphorus level, acute rejection, and urological complications are the top predictors of graft failure.

Conclusions: Bagging, boosting, and stacking, with probability calibration, are good choices for clinical risk predictions working on imbalanced data. The data-driven probability threshold is more beneficial than the natural threshold of 0.5 to improve the prediction result from imbalanced data. Integrating various techniques in a systematic framework is a smart strategy to improve prediction results from imbalanced data. It is recommended for clinical experts in kidney transplantation to use the final calibrated model as a decision support system to predict the risk of graft failure for individual patients.

Citing Articles

Developing clinical prognostic models to predict graft survival after renal transplantation: comparison of statistical and machine learning models.

Mulugeta G, Zewotir T, Tegegne A, Muleta M, Juhar L BMC Med Inform Decis Mak. 2025; 25(1):54.

PMID: 39901148 PMC: 11792663. DOI: 10.1186/s12911-025-02906-y.

Comparative study of ten machine learning algorithms for short-term forecasting in gas warning systems.

Wu R, Shafiabady N, Zhang H, Lu H, Gide E, Liu J Sci Rep. 2024; 14(1):21969.

PMID: 39304669 PMC: 11415518. DOI: 10.1038/s41598-024-67283-4.

A machine learning approach towards assessing consistency and reproducibility: an application to graft survival across three kidney transplantation eras.

Achilonu O, Obaido G, Ogbuokiri B, Aruleba K, Musenge E, Fabian J Front Digit Health. 2024; 6:1427845.

PMID: 39290362 PMC: 11405382. DOI: 10.3389/fdgth.2024.1427845.

The transformative potential of artificial intelligence in solid organ transplantation.

Moussawy M, Lakkis Z, Ansari Z, Cherukuri A, Abou-Daya K Front Transplant. 2024; 3:1361491.

PMID: 38993779 PMC: 11235281. DOI: 10.3389/frtra.2024.1361491.

The predictive power of data: machine learning analysis for Covid-19 mortality based on personal, clinical, preclinical, and laboratory variables in a case-control study.

Seyedtabib M, Najafi-Vosough R, Kamyari N BMC Infect Dis. 2024; 24(1):411.

PMID: 38637727 PMC: 11025285. DOI: 10.1186/s12879-024-09298-w.

References

Kruppa J, Ziegler A, Konig I . Risk estimation and risk prediction using machine-learning methods. Hum Genet. 2012; 131(10):1639-54. PMC: 3432206. DOI: 10.1007/s00439-012-1194-y. View

Senanayake S, White N, Graves N, Healy H, Baboolal K, Kularatna S . Machine learning in predicting graft failure following kidney transplantation: A systematic review of published predictive models. Int J Med Inform. 2019; 130:103957. DOI: 10.1016/j.ijmedinf.2019.103957. View

Christodoulou E, Ma J, Collins G, Steyerberg E, Verbakel J, Van Calster B . A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019; 110:12-22. DOI: 10.1016/j.jclinepi.2019.02.004. View

Gozdowska J, Czerwinska M, Chabros L, Mlynarczyk G, Kwiatkowski A, Chmura A . Urinary Tract Infections in Kidney Transplant Recipients Hospitalized at a Transplantation and Nephrology Ward: 1-Year Follow-up. Transplant Proc. 2016; 48(5):1580-9. DOI: 10.1016/j.transproceed.2016.01.061. View

Darst B, Malecki K, Engelman C . Using recursive feature elimination in random forest to account for correlated variables in high dimensional data. BMC Genet. 2018; 19(Suppl 1):65. PMC: 6157185. DOI: 10.1186/s12863-018-0633-8. View

Yang L, Wu H, Jin X, Zheng P, Hu S, Xu X . Study of cardiovascular disease prediction model based on random forest in eastern China. Sci Rep. 2020; 10(1):5245. PMC: 7090086. DOI: 10.1038/s41598-020-62133-5. View

Oosterhoff J, Gravesteijn B, Karhade A, Jaarsma R, Kerkhoffs G, Ring D . Feasibility of Machine Learning and Logistic Regression Algorithms to Predict Outcome in Orthopaedic Trauma Surgery. J Bone Joint Surg Am. 2021; 104(6):544-551. DOI: 10.2106/JBJS.21.00341. View

Sadeghi S, Khalili D, Ramezankhani A, Mansournia M, Parsaeian M . Diabetes mellitus risk prediction in the presence of class imbalance using flexible machine learning methods. BMC Med Inform Decis Mak. 2022; 22(1):36. PMC: 8830137. DOI: 10.1186/s12911-022-01775-z. View

Alemu H, Hailu W, Adane A . Prevalence of Chronic Kidney Disease and Associated Factors among Patients with Diabetes in Northwest Ethiopia: A Hospital-Based Cross-Sectional Study. Curr Ther Res Clin Exp. 2020; 92:100578. PMC: 7068620. DOI: 10.1016/j.curtheres.2020.100578. View

10.

Xu Q, Xiong Y, Dai H, Kumari K, Xu Q, Ou H . PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol. 2017; 417:1-7. DOI: 10.1016/j.jtbi.2017.01.019. View

11.

Requiao-Moura L, Moreira Albino C, Bicalho P, Ferraz E, Pires L, da Silva M . Long-term outcomes after kidney transplant failure and variables related to risk of death and probability of retransplant: Results from a single-center cohort study in Brazil. PLoS One. 2021; 16(1):e0245628. PMC: 7816974. DOI: 10.1371/journal.pone.0245628. View

12.

van den Goorbergh R, van Smeden M, Timmerman D, Van Calster B . The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022; 29(9):1525-1534. PMC: 9382395. DOI: 10.1093/jamia/ocac093. View

13.

Fenlon C, OGrady L, Doherty M, Dunnion J . A discussion of calibration techniques for evaluating binary and categorical predictive models. Prev Vet Med. 2018; 149:107-114. DOI: 10.1016/j.prevetmed.2017.11.018. View

14.

Bicalho P, Requiao-Moura L, Arruda E, Chinen R, Mello L, Bertocchi A . Long-Term Outcomes among Kidney Transplant Recipients and after Graft Failure: A Single-Center Cohort Study in Brazil. Biomed Res Int. 2019; 2019:7105084. PMC: 6466891. DOI: 10.1155/2019/7105084. View

15.

Brisco M, Coca S, Chen J, Owens A, McCauley B, Kimmel S . Blood urea nitrogen/creatinine ratio identifies a high-risk but potentially reversible form of renal dysfunction in patients with decompensated heart failure. Circ Heart Fail. 2013; 6(2):233-9. PMC: 4067251. DOI: 10.1161/CIRCHEARTFAILURE.112.968230. View

16.

Huang Y, Li W, Macheret F, Gabriel R, Ohno-Machado L . A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc. 2020; 27(4):621-633. PMC: 7075534. DOI: 10.1093/jamia/ocz228. View

17.

Wang J, Hart A . Global Perspective on Kidney Transplantation: United States. Kidney360. 2022; 2(11):1836-1839. PMC: 8785833. DOI: 10.34067/KID.0002472021. View

18.

Yi H, You Z, Wang M, Guo Z, Wang Y, Zhou J . RPI-SE: a stacking ensemble learning framework for ncRNA-protein interactions prediction using sequence information. BMC Bioinformatics. 2020; 21(1):60. PMC: 7029608. DOI: 10.1186/s12859-020-3406-0. View

19.

Zhou T, Geng Y, Chen J, Pan J, Haase D, Lausch A . High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms. Sci Total Environ. 2020; 729:138244. DOI: 10.1016/j.scitotenv.2020.138244. View

20.

Saha S, Saha M, Mukherjee K, Arabameri A, Ngo P, Paul G . Predicting the deforestation probability using the binary logistic regression, random forest, ensemble rotational forest, REPTree: A case study at the Gumani River Basin, India. Sci Total Environ. 2020; 730:139197. DOI: 10.1016/j.scitotenv.2020.139197. View