» Articles » PMID: 27843486

Efficient Regularized Regression with Penalty for Variable Selection and Network Construction

Overview
Publisher Hindawi
Date 2016 Nov 16
PMID 27843486
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

Variable selections for regression with high-dimensional big data have found many applications in bioinformatics and computational biology. One appealing approach is the regularized regression which penalizes the number of nonzero features in the model directly. However, it is well known that optimization is NP-hard and computationally challenging. In this paper, we propose efficient EM (EM) and dual EM (DEM) algorithms that directly approximate the optimization problem. While EM is efficient with large sample size, DEM is efficient with high-dimensional ( ≪ ) data. They also provide a natural solution to all    ∈ [0,2] problems, including lasso with = 1 and elastic net with ∈ [1,2]. The regularized parameter can be determined through cross validation or AIC and BIC. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that has better performance than lasso, SCAD, and MC+, and with AIC or BIC has similar performance as computationally intensive cross validation. The proposed algorithms are efficient in identifying the nonzero variables with less bias and constructing biologically important networks with high-dimensional big data.

Citing Articles

Variable selection for recurrent event data with broken adaptive ridge regression.

Zhao H, Sun D, Li G, Sun J Can J Stat. 2020; 46(3):416-428.

PMID: 32999527 PMC: 7523880. DOI: 10.1002/cjs.11459.


Simultaneous Estimation and Variable Selection for Interval-Censored Data with Broken Adaptive Ridge Regression.

Zhao H, Wu Q, Li G, Sun J J Am Stat Assoc. 2020; 115(529):204-216.

PMID: 32742044 PMC: 7394486. DOI: 10.1080/01621459.2018.1537922.


Variable selection for high-dimensional partly linear additive Cox model with application to Alzheimer's disease.

Wu Q, Zhao H, Zhu L, Sun J Stat Med. 2020; 39(23):3120-3134.

PMID: 32652699 PMC: 7936877. DOI: 10.1002/sim.8594.


Sparse support vector machines with L approximation for ultra-high dimensional omics data.

Liu Z, Elashoff D, Piantadosi S Artif Intell Med. 2019; 96:134-141.

PMID: 31164207 PMC: 6553498. DOI: 10.1016/j.artmed.2019.04.004.


Broken adaptive ridge regression and its asymptotic properties.

Dai L, Chen K, Sun Z, Liu Z, Li G J Multivar Anal. 2019; 168:334-351.

PMID: 30911202 PMC: 6430210. DOI: 10.1016/j.jmva.2018.08.007.


References
1.
Liu Z, Lin S, Tan M . Sparse support vector machines with Lp penalty for biomarker identification. IEEE/ACM Trans Comput Biol Bioinform. 2010; 7(1):100-7. DOI: 10.1109/TCBB.2008.17. View

2.
Zhou H, Wu Y . A Generic Path Algorithm for Regularized Statistical Estimation. J Am Stat Assoc. 2014; 109(506):686-699. PMC: 4167778. DOI: 10.1080/01621459.2013.864166. View

3.
Liu Z, Lin S, Piantadosi S . Network construction and structure detection with metagenomic count data. BioData Min. 2015; 8:40. PMC: 4676895. DOI: 10.1186/s13040-015-0072-2. View

4.
Zhao G, Chen J, Deng Y, Gao F, Zhu J, Feng Z . Identification of NDRG1-regulated genes associated with invasive potential in cervical and ovarian cancer cells. Biochem Biophys Res Commun. 2011; 408(1):154-9. DOI: 10.1016/j.bbrc.2011.03.140. View

5.
Li Y, Liang M, Zhang Z . Regression analysis of combined gene expression regulation in acute myeloid leukemia. PLoS Comput Biol. 2014; 10(10):e1003908. PMC: 4207489. DOI: 10.1371/journal.pcbi.1003908. View