» Articles » PMID: 27254731

Principal Components Analysis and the Reported Low Intrinsic Dimensionality of Gene Expression Microarray Data

Overview
Journal Sci Rep
Specialty Science
Date 2016 Jun 3
PMID 27254731
Citations 30
Authors
Affiliations
Soon will be listed here.
Abstract

Principal components analysis (PCA) is a common unsupervised method for the analysis of gene expression microarray data, providing information on the overall structure of the analyzed dataset. In the recent years, it has been applied to very large datasets involving many different tissues and cell types, in order to create a low dimensional global map of human gene expression. Here, we reevaluate this approach and show that the linear intrinsic dimensionality of this global map is higher than previously reported. Furthermore, we analyze in which cases PCA fails to detect biologically relevant information and point the reader to methods that overcome these limitations. Our results refine the current understanding of the overall structure of gene expression spaces and show that PCA critically depends on the effect size of the biological signal as well as on the fraction of samples containing this signal.

Citing Articles

A diagnostic model for sepsis using an integrated machine learning framework approach and its therapeutic drug discovery.

Zhang W, Shi H, Peng J BMC Infect Dis. 2025; 25(1):219.

PMID: 39953444 PMC: 11827343. DOI: 10.1186/s12879-025-10616-z.


Statistically principled feature selection for single cell transcriptomics.

Dollinger E, Silkwood K, Atwood S, Nie Q, Lander A bioRxiv. 2024; .

PMID: 39463971 PMC: 11507810. DOI: 10.1101/2024.10.11.617709.


Serum -Glycan Changes in Rats Chronically Exposed to Glyphosate-Based Herbicides.

Adeniyi M, Gutierrez Reyes C, Chavez-Reyes J, Marichal-Cancino B, Solomon J, Fowowe M Biomolecules. 2024; 14(9).

PMID: 39334844 PMC: 11430009. DOI: 10.3390/biom14091077.


Genetic and Physiological Insights into Salt Resistance in Rice through Analysis of Germination, Seedling Traits, and QTL Identification.

Yuan J, Wang Q, Wang X, Yuan B, Wang G, Wang F Life (Basel). 2024; 14(8).

PMID: 39202773 PMC: 11355933. DOI: 10.3390/life14081030.


Characteristics of the Dynamic Evolutionary Pathway of ADSCs Induced Differentiation into Astrocytes Based on scRNA-Seq Analysis.

Yuan X, Long Q, Li W, Yan Q, Zhang P Mol Neurobiol. 2024; 62(3):2926-2944.

PMID: 39190264 DOI: 10.1007/s12035-024-04414-y.


References
1.
Fehrmann R, Karjalainen J, Krajewska M, Westra H, Maloney D, Simeonov A . Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat Genet. 2015; 47(2):115-25. DOI: 10.1038/ng.3173. View

2.
Schmid P, Palmer N, Kohane I, Berger B . Making sense out of massive data by going beyond differential expression. Proc Natl Acad Sci U S A. 2012; 109(15):5594-9. PMC: 3326474. DOI: 10.1073/pnas.1118792109. View

3.
Gatti D, Barry W, Nobel A, Rusyn I, Wright F . Heading down the wrong pathway: on the influence of correlation within gene sets. BMC Genomics. 2010; 11:574. PMC: 3091509. DOI: 10.1186/1471-2164-11-574. View

4.
Fortin J, Labbe A, Lemire M, Zanke B, Hudson T, Fertig E . Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 2015; 15(12):503. PMC: 4283580. DOI: 10.1186/s13059-014-0503-2. View

5.
Muller F, Schuppert A . Few inputs can reprogram biological networks. Nature. 2011; 478(7369):E4. DOI: 10.1038/nature10543. View