» Articles » PMID: 30184048

Precision Lasso: Accounting for Correlations and Linear Dependencies in High-dimensional Genomic Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2018 Sep 6
PMID 30184048
Citations 80
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection.

Results: To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression.

Availability And Implementation: Software is available at https://github.com/HaohanWang/thePrecisionLasso.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

Machine learning and metabolomics identify biomarkers associated with the disease extent of ulcerative colitis.

Ge C, Lu Y, Shen Z, Lu Y, Liu X, Zhang M J Crohns Colitis. 2025; 19(2).

PMID: 39903649 PMC: 11829215. DOI: 10.1093/ecco-jcc/jjaf020.


The Clinical Prediction Value of the Ubiquitination Model Reflecting the Microenvironment Infiltration and Drug Sensitivity in Breast Cancer.

Ma H, Cao J, Zhang Y, Yang J, Wang X, Yu Y J Cancer. 2025; 16(3):784-801.

PMID: 39781342 PMC: 11705054. DOI: 10.7150/jca.101525.


An integrative analysis reveals cancer risk associated with artificial sweeteners.

Xie J, Zhu Y, Yang Z, Yu Z, Yang M, Wang Q J Transl Med. 2025; 23(1):32.

PMID: 39780215 PMC: 11708064. DOI: 10.1186/s12967-024-06047-0.


Prognostic features of bladder cancer based on five neddylation-related genes.

Guo J, Zhang Y, He L, Wang X, Chen Z, Yao C Am J Clin Exp Urol. 2024; 12(5):240-254.

PMID: 39584004 PMC: 11578774. DOI: 10.62347/RWCH7802.


Enhanced understanding of cinnamaldehyde's therapeutic potential in osteoarthritis through bioinformatics and mechanistic validation of its anti-apoptotic effect.

Sheng Y, Zhai R, Li S, Wang X, Wang Y, Cui Z Front Med (Lausanne). 2024; 11:1448937.

PMID: 39376659 PMC: 11456544. DOI: 10.3389/fmed.2024.1448937.


References
1.
Forbes S, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H . COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2014; 43(Database issue):D805-11. PMC: 4383913. DOI: 10.1093/nar/gku1075. View

2.
He Q, Lin D . A variable selection method for genome-wide association studies. Bioinformatics. 2010; 27(1):1-8. PMC: 3025714. DOI: 10.1093/bioinformatics/btq600. View

3.
Ogutu J, Schulz-Streeck T, Piepho H . Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions. BMC Proc. 2012; 6 Suppl 2:S10. PMC: 3363152. DOI: 10.1186/1753-6561-6-S2-S10. View

4.
Peltola T, Marttinen P, Vehtari A . Finite adaptation and multistep moves in the metropolis-hastings algorithm for variable selection in genome-wide association analysis. PLoS One. 2012; 7(11):e49445. PMC: 3499564. DOI: 10.1371/journal.pone.0049445. View

5.
Zhang H . Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space. J R Stat Soc Series B Stat Methodol. 2009; 70(5):903. PMC: 2709408. DOI: 10.1111/j.1467-9868.2008.00674.x. View