» Articles » PMID: 35525975

Sparse Sliced Inverse Regression for High Dimensional Data Analysis

Overview
Publisher Biomed Central
Specialty Biology
Date 2022 May 7
PMID 35525975
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony.

Results: To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature.

Conclusion: The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method's effectiveness in practice.

References
1.
Lin Q, Zhao Z, Liu J . Sparse Sliced Inverse Regression Via Lasso. J Am Stat Assoc. 2020; 114(528):1726-1739. PMC: 7500493. DOI: 10.1080/01621459.2018.1520115. View

2.
Li L, Yin X . Sliced inverse regression with regularizations. Biometrics. 2007; 64(1):124-31. DOI: 10.1111/j.1541-0420.2007.00836.x. View

3.
Gautier L, Cope L, Bolstad B, Irizarry R . affy--analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 2004; 20(3):307-15. DOI: 10.1093/bioinformatics/btg405. View

4.
Witten D, Tibshirani R . Penalized classification using Fisher's linear discriminant. J R Stat Soc Series B Stat Methodol. 2012; 73(5):753-772. PMC: 3272679. DOI: 10.1111/j.1467-9868.2011.00783.x. View

5.
Frazee A, Langmead B, Leek J . ReCount: a multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics. 2011; 12:449. PMC: 3229291. DOI: 10.1186/1471-2105-12-449. View