» Articles » PMID: 34289843

An Ensemble-based Feature Selection Framework to Select Risk Factors of Childhood Obesity for Policy Decision Making

Overview
Publisher Biomed Central
Date 2021 Jul 22
PMID 34289843
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The increasing prevalence of childhood obesity makes it essential to study the risk factors with a sample representative of the population covering more health topics for better preventive policies and interventions. It is aimed to develop an ensemble feature selection framework for large-scale data to identify risk factors of childhood obesity with good interpretability and clinical relevance.

Methods: We analyzed the data collected from 426,813 children under 18 during 2000-2019. A BMI above the 90th percentile for the children of the same age and gender was defined as overweight. An ensemble feature selection framework, Bagging-based Feature Selection framework integrating MapReduce (BFSMR), was proposed to identify risk factors. The framework comprises 5 models (filter with mutual information/SVM-RFE/Lasso/Ridge/Random Forest) from filter, wrapper, and embedded feature selection methods. Each feature selection model identified 10 variables based on variable importance. Considering accuracy, F-score, and model characteristics, the models were classified into 3 levels with different weights: Lasso/Ridge, Filter/SVM-RFE, and Random Forest. The voting strategy was applied to aggregate the selected features, with both feature weights and model weights taken into consideration. We compared our voting strategy with another two for selecting top-ranked features in terms of 6 dimensions of interpretability.

Results: Our method performed the best to select the features with good interpretability and clinical relevance. The top 10 features selected by BFSMR are age, sex, birth year, breastfeeding type, smoking habit and diet-related knowledge of both children and mothers, exercise, and Mother's systolic blood pressure.

Conclusion: Our framework provides a solution for identifying a diverse and interpretable feature set without model bias from large-scale data, which can help identify risk factors of childhood obesity and potentially some other diseases for future interventions or policies.

Citing Articles

Advancing precision public health for obesity in children.

Baker J, Bjerregaard L Rev Endocr Metab Disord. 2023; 24(5):1003-1010.

PMID: 37055611 PMC: 10101815. DOI: 10.1007/s11154-023-09802-8.


Evaluating the risk of hypertension in residents in primary care in Shanghai, China with machine learning algorithms.

Chen N, Fan F, Geng J, Yang Y, Gao Y, Jin H Front Public Health. 2022; 10:984621.

PMID: 36267989 PMC: 9577109. DOI: 10.3389/fpubh.2022.984621.


System Architecture of a European Platform for Health Policy Decision Making: MIDAS.

Shi X, Nikolic G, Fischaber S, Black M, Rankin D, Epelde G Front Public Health. 2022; 10:838438.

PMID: 35433572 PMC: 9008448. DOI: 10.3389/fpubh.2022.838438.

References
1.
Kraskov A, Stogbauer H, Grassberger P . Estimating mutual information. Phys Rev E Stat Nonlin Soft Matter Phys. 2004; 69(6 Pt 2):066138. DOI: 10.1103/PhysRevE.69.066138. View

2.
Bagherzadeh-Khiabani F, Ramezankhani A, Azizi F, Hadaegh F, Steyerberg E, Khalili D . A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J Clin Epidemiol. 2015; 71:76-85. DOI: 10.1016/j.jclinepi.2015.10.002. View

3.
Schooling C, Jones H . Clarifying questions about "risk factors": predictors versus explanation. Emerg Themes Epidemiol. 2018; 15:10. PMC: 6083579. DOI: 10.1186/s12982-018-0080-z. View

4.
Dev D, McBride B, Fiese B, Jones B, Cho H . Risk factors for overweight/obesity in preschool children: an ecological approach. Child Obes. 2013; 9(5):399-408. PMC: 3791057. DOI: 10.1089/chi.2012.0150. View

5.
Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F . Gene prioritization through genomic data fusion. Nat Biotechnol. 2006; 24(5):537-44. DOI: 10.1038/nbt1203. View