» Articles » PMID: 38060576

Predicting Asthma Using Imbalanced Data Modeling Techniques: Evidence from 2019 Michigan BRFSS Data

Overview
Journal PLoS One
Date 2023 Dec 7
PMID 38060576
Authors
Affiliations
Soon will be listed here.
Abstract

Studies in the past have examined asthma prevalence and the associated risk factors in the United States using data from national surveys. However, the findings of these studies may not be relevant to specific states because of the different environmental and socioeconomic factors that vary across regions. The 2019 Behavioral Risk Factor Surveillance System (BRFSS) showed that Michigan had higher asthma prevalence rates than the national average. In this regard, we employ various modern machine learning techniques to predict asthma and identify risk factors associated with asthma among Michigan adults using the 2019 BRFSS data. After data cleaning, a sample of 10,337 individuals was selected for analysis, out of which 1,118 individuals (10.8%) reported having asthma during the survey period. Typical machine learning techniques often perform poorly due to imbalanced data issues. To address this challenge, we employed two synthetic data generation techniques, namely the Random Over-Sampling Examples (ROSE) and Synthetic Minority Over-Sampling Technique (SMOTE) and compared their performances. The overall performance of machine learning algorithms was improved using both methods, with ROSE performing better than SMOTE. Among the ROSE-adjusted models, we found that logistic regression, partial least squares, gradient boosting, LASSO, and elastic net had comparable performance, with sensitivity at around 50% and area under the curve (AUC) at around 63%. Due to ease of interpretability, logistic regression is chosen for further exploration of risk factors. Presence of chronic obstructive pulmonary disease, lower income, female sex, financial barrier to see a doctor due to cost, taken flu shot/spray in the past 12 months, 18-24 age group, Black, non-Hispanic group, and presence of diabetes are identified as asthma risk factors. This study demonstrates the potentiality of machine learning coupled with imbalanced data modeling approaches for predicting asthma from a large survey dataset. We conclude that the findings could guide early screening of at-risk asthma patients and designing appropriate interventions to improve care practices.

Citing Articles

The association between behavioral habits and physical health status in prostate cancer patients: a large US national health-related survey.

Chen C, Briggs L, Koelker M, Stone B, Alkhatib K, Labban M Prostate Int. 2024; 12(4):207-212.

PMID: 39735196 PMC: 11681352. DOI: 10.1016/j.prnil.2024.08.001.

References
1.
Hsu J, Chen J, Mirabelli M . Asthma Morbidity, Comorbidities, and Modifiable Factors Among Older Adults. J Allergy Clin Immunol Pract. 2017; 6(1):236-243.e7. PMC: 5760447. DOI: 10.1016/j.jaip.2017.06.007. View

2.
Zein J, Wu C, Attaway A, Zhang P, Nazha A . Novel Machine Learning Can Predict Acute Asthma Exacerbation. Chest. 2021; 159(5):1747-1757. PMC: 8129731. DOI: 10.1016/j.chest.2020.12.051. View

3.
Ehrlich S, Quesenberry Jr C, Van Den Eeden S, Shan J, Ferrara A . Patients diagnosed with diabetes are at increased risk for asthma, chronic obstructive pulmonary disease, pulmonary fibrosis, and pneumonia but not lung cancer. Diabetes Care. 2009; 33(1):55-60. PMC: 2797986. DOI: 10.2337/dc09-0880. View

4.
Greenblatt R, Mansour O, Zhao E, Ross M, Himes B . Gender-specific determinants of asthma among U.S. adults. Asthma Res Pract. 2017; 3:2. PMC: 5259982. DOI: 10.1186/s40733-017-0030-5. View

5.
Zahran H, Bailey C . Factors associated with asthma prevalence among racial and ethnic groups--United States, 2009-2010 behavioral risk factor surveillance system. J Asthma. 2013; 50(6):583-9. PMC: 4554487. DOI: 10.3109/02770903.2013.794238. View