» Articles » PMID: 25386043

Interaction Screening for Ultra-High Dimensional Data

Overview
Journal J Am Stat Assoc
Specialty Public Health
Date 2014 Nov 12
PMID 25386043
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

In ultra-high dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a data set with observations and predictors, the augmented design matrix including all linear and order-2 terms is of size × ( + 3)/2. When is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction selection consistency is hard to achieve in high dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is in for sparse models, hence feasible for ≫ . Theoretically, we prove that they possess sure screening property for ultra-high dimensional settings. Numerical examples are used to demonstrate their finite sample performance.

Citing Articles

Conditional Variable Screening for Ultra-High Dimensional Longitudinal Data With Time Interactions.

Bratsberg A, Ghosh A, Thoresen M Biom J. 2024; 66(8):e70005.

PMID: 39579050 PMC: 11585226. DOI: 10.1002/bimj.70005.


The Kendall interaction filter for variable interaction screening in high dimensional classification problems.

Anzarmou Y, Mkhadri A, Oualkacha K J Appl Stat. 2023; 50(7):1496-1514.

PMID: 37197752 PMC: 10184587. DOI: 10.1080/02664763.2022.2031125.


Integrating and optimizing genomic, weather, and secondary trait data for multiclass classification.

Manthena V, Jarquin D, Howard R Front Genet. 2023; 13:1032691.

PMID: 37065625 PMC: 10090538. DOI: 10.3389/fgene.2022.1032691.


Mapping the Genetic-Imaging-Clinical Pathway with Applications to Alzheimer's Disease.

Yu D, Wang L, Kong D, Zhu H J Am Stat Assoc. 2023; 117(540):1656-1668.

PMID: 37009529 PMC: 10062702. DOI: 10.1080/01621459.2022.2087658.


Unified model-free interaction screening via CV-entropy filter.

Xiong W, Chen Y, Ma S Comput Stat Data Anal. 2023; 180.

PMID: 36910335 PMC: 9997997. DOI: 10.1016/j.csda.2022.107684.


References
1.
Wu T, Chen Y, Hastie T, Sobel E, Lange K . Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics. 2009; 25(6):714-21. PMC: 2732298. DOI: 10.1093/bioinformatics/btp041. View

2.
Wu J, Devlin B, Ringquist S, Trucco M, Roeder K . Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet Epidemiol. 2010; 34(3):275-85. PMC: 2915560. DOI: 10.1002/gepi.20459. View

3.
Manolio T, Collins F . Genes, environment, health, and disease: facing up to complexity. Hum Hered. 2007; 63(2):63-6. DOI: 10.1159/000099178. View

4.
Cordell H . Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009; 10(6):392-404. PMC: 2872761. DOI: 10.1038/nrg2579. View

5.
Zhang H . Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space. J R Stat Soc Series B Stat Methodol. 2009; 70(5):903. PMC: 2709408. DOI: 10.1111/j.1467-9868.2008.00674.x. View