» Articles » PMID: 38375692

Omics Feature Selection with the Extended SIS R Package: Identification of a Body Mass Index Epigenetic Multimarker in the Strong Heart Study

Overview
Journal Am J Epidemiol
Specialty Public Health
Date 2024 Feb 20
PMID 38375692
Authors
Affiliations
Soon will be listed here.
Abstract

The statistical analysis of omics data poses a great computational challenge given their ultra-high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.

Citing Articles

Multicohort Epigenome-Wide Association Study of All-Cause Cardiovascular Disease and Cancer Incidence: A Cardio-Oncology Approach.

Domingo-Relloso A, Riffo-Campos A, Zhao N, Ayala G, Haack K, Manterola C JACC CardioOncol. 2024; 6(5):731-742.

PMID: 39479324 PMC: 11520201. DOI: 10.1016/j.jaccao.2024.07.014.

References
1.
Waldmann P, Meszaros G, Gredler B, Fuerst C, Solkner J . Evaluation of the lasso and the elastic net in genome-wide association studies. Front Genet. 2013; 4:270. PMC: 3850240. DOI: 10.3389/fgene.2013.00270. View

2.
Leek J, Johnson W, Parker H, Jaffe A, Storey J . The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012; 28(6):882-3. PMC: 3307112. DOI: 10.1093/bioinformatics/bts034. View

3.
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A . ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009; 25(8):1091-3. PMC: 2666812. DOI: 10.1093/bioinformatics/btp101. View

4.
Yousefi P, Suderman M, Langdon R, Whitehurst O, Davey Smith G, Relton C . DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet. 2022; 23(6):369-383. DOI: 10.1038/s41576-022-00465-w. View

5.
Wahl S, Drong A, Lehne B, Loh M, Scott W, Kunze S . Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature. 2016; 541(7635):81-86. PMC: 5570525. DOI: 10.1038/nature20784. View