Conceptualizing Bias in EHR Data: A Case Study in Performance Disparities by Demographic Subgroups for a Pediatric Obesity Incidence Classifier

Overview

Journal PLOS Digit Health

Date 2024 Oct 23

PMID 39441784

Authors

Elizabeth A Campbell

Saurav Bose

Aaron J Masino

Affiliations

Soon will be listed here.

Abstract

Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.

References

Thakur N, Oh S, Nguyen E, Martin M, Roth L, Galanter J . Socioeconomic status and childhood asthma in urban minority youths. The GALA II and SAGE II studies. Am J Respir Crit Care Med. 2013; 188(10):1202-9. PMC: 3863734. DOI: 10.1164/rccm.201306-1016OC. View

Stern J, Chen M, Fagnano M, Halterman J . Allergic rhinitis co-morbidity on asthma outcomes in city school children. J Asthma. 2022; 60(2):255-261. PMC: 9653514. DOI: 10.1080/02770903.2022.2043363. View

Andaur Navarro C, Damen J, Takada T, Nijman S, Dhiman P, Ma J . Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ. 2021; 375:n2281. PMC: 8527348. DOI: 10.1136/bmj.n2281. View

Christodoulou E, Ma J, Collins G, Steyerberg E, Verbakel J, Van Calster B . A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019; 110:12-22. DOI: 10.1016/j.jclinepi.2019.02.004. View

Schuch H, Furtado M, Silva G, Kawachi I, Chiavegatto Filho A, Elani H . Fairness of Machine Learning Algorithms for Predicting Foregone Preventive Dental Care for Adults. JAMA Netw Open. 2023; 6(11):e2341625. PMC: 10625037. DOI: 10.1001/jamanetworkopen.2023.41625. View

Barton M, Hamza M, Guevel B . Racial Equity in Healthcare Machine Learning: Illustrating Bias in Models With Minimal Bias Mitigation. Cureus. 2023; 15(2):e35037. PMC: 10023594. DOI: 10.7759/cureus.35037. View

Carter S, Rogers W, Win K, Frazer H, Richards B, Houssami N . The ethical, legal and social implications of using artificial intelligence systems in breast cancer care. Breast. 2019; 49:25-32. PMC: 7375671. DOI: 10.1016/j.breast.2019.10.001. View

Iguacel I, Gasch-Gallen A, Ayala-Marin A, De Miguel-Etayo P, Moreno L . Social vulnerabilities as risk factor of childhood obesity development and their role in prevention programs. Int J Obes (Lond). 2020; 45(1):1-11. DOI: 10.1038/s41366-020-00697-y. View

Andreu-Perez J, Poon C, Merrifield R, Wong S, Yang G . Big data for health. IEEE J Biomed Health Inform. 2015; 19(4):1193-208. DOI: 10.1109/JBHI.2015.2450362. View

10.

Weinmayr G, Forastiere F, Buchele G, Jaensch A, Strachan D, Nagel G . Overweight/obesity and respiratory and allergic disease in children: international study of asthma and allergies in childhood (ISAAC) phase two. PLoS One. 2014; 9(12):e113996. PMC: 4256390. DOI: 10.1371/journal.pone.0113996. View

11.

Rajkomar A, Hardt M, Howell M, Corrado G, Chin M . Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med. 2018; 169(12):866-872. PMC: 6594166. DOI: 10.7326/M18-1990. View

12.

Seol H, Shrestha P, Muth J, Wi C, Sohn S, Ryu E . Artificial intelligence-assisted clinical decision support for childhood asthma management: A randomized clinical trial. PLoS One. 2021; 16(8):e0255261. PMC: 8328289. DOI: 10.1371/journal.pone.0255261. View

13.

Skinner A, Ravanbakht S, Skelton J, Perrin E, Armstrong S . Prevalence of Obesity and Severe Obesity in US Children, 1999-2016. Pediatrics. 2018; 141(3). PMC: 6109602. DOI: 10.1542/peds.2017-3459. View

14.

Karnik S, Kanekar A . Childhood obesity: a global public health crisis. Int J Prev Med. 2012; 3(1):1-7. PMC: 3278864. View

15.

Kuczmarski R, Ogden C, Guo S, Grummer-Strawn L, Flegal K, Mei Z . 2000 CDC Growth Charts for the United States: methods and development. Vital Health Stat 11. 2002; (246):1-190. View

16.

Cutillo C, Sharma K, Foschini L, Kundu S, Mackintosh M, Mandl K . Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med. 2020; 3:47. PMC: 7099019. DOI: 10.1038/s41746-020-0254-2. View

17.

Chen I, Szolovits P, Ghassemi M . Can AI Help Reduce Disparities in General Medical and Mental Health Care?. AMA J Ethics. 2019; 21(2):E167-179. DOI: 10.1001/amajethics.2019.167. View

18.

Juhn Y, Ryu E, Wi C, King K, Malik M, Romero-Brufau S . Assessing socioeconomic bias in machine learning algorithms in health care: a case study of the HOUSES index. J Am Med Inform Assoc. 2022; 29(7):1142-1151. PMC: 9196683. DOI: 10.1093/jamia/ocac052. View

19.

Char D, Shah N, Magnus D . Implementing Machine Learning in Health Care - Addressing Ethical Challenges. N Engl J Med. 2018; 378(11):981-983. PMC: 5962261. DOI: 10.1056/NEJMp1714229. View

20.

Bailey L, Milov D, Kelleher K, Kahn M, Del Beccaro M, Yu F . Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity. PLoS One. 2013; 8(6):e66192. PMC: 3688837. DOI: 10.1371/journal.pone.0066192. View