» Articles » PMID: 19756232

TESTING SIGNIFICANCE OF FEATURES BY LASSOED PRINCIPAL COMPONENTS

Overview
Journal Ann Appl Stat
Date 2009 Sep 17
PMID 19756232
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L(1) penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

Citing Articles

Pancreatic mucinous adenocarcinoma has different clinical characteristics and better prognosis compared to non-specific PDAC: A retrospective observational study.

Liu J, Zhang Y, Zhou J, Zhang Z, Wen Y Heliyon. 2024; 10(9):e30268.

PMID: 38720717 PMC: 11076975. DOI: 10.1016/j.heliyon.2024.e30268.


Thresholding Gini variable importance with a single-trained random forest: An empirical Bayes approach.

Dunne R, Reguant R, Ramarao-Milne P, Szul P, Sng L, Lundberg M Comput Struct Biotechnol J. 2023; 21:4354-4360.

PMID: 37711185 PMC: 10497997. DOI: 10.1016/j.csbj.2023.08.033.


Hypertrophic Cardiomyopathy Registry: The rationale and design of an international, observational study of hypertrophic cardiomyopathy.

Kramer C, Appelbaum E, Desai M, Desvigne-Nickens P, DiMarco J, Friedrich M Am Heart J. 2015; 170(2):223-30.

PMID: 26299218 PMC: 4548277. DOI: 10.1016/j.ahj.2015.05.013.


Identification of significant features in DNA microarray data.

Bair E Wiley Interdiscip Rev Comput Stat. 2013; 5(4).

PMID: 24244802 PMC: 3826574. DOI: 10.1002/wics.1260.


Transcriptomic profiles of high and low antibody responders to smallpox vaccine.

Kennedy R, Oberg A, Ovsyannikova I, Haralambieva I, Grill D, Poland G Genes Immun. 2013; 14(5):277-85.

PMID: 23594957 PMC: 3723701. DOI: 10.1038/gene.2013.14.


References
1.
Sjoblom T, Jones S, Wood L, Parsons D, Lin J, Barber T . The consensus coding sequences of human breast and colorectal cancers. Science. 2006; 314(5797):268-74. DOI: 10.1126/science.1133427. View

2.
Carvalho C, Chang J, Lucas J, Nevins J, Wang Q, West M . High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics. J Am Stat Assoc. 2011; 103(484):1438-1456. PMC: 3017385. DOI: 10.1198/016214508000000869. View

3.
Allison D, Cui X, Page G, Sabripour M . Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2005; 7(1):55-65. DOI: 10.1038/nrg1749. View

4.
Beer D, Kardia S, Huang C, Giordano T, Levin A, Misek D . Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat Med. 2002; 8(8):816-24. DOI: 10.1038/nm733. View

5.
Bair E, Tibshirani R . Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004; 2(4):E108. PMC: 387275. DOI: 10.1371/journal.pbio.0020108. View