» Articles » PMID: 27930330

Simultaneous Dimension Reduction and Adjustment for Confounding Variation

Overview
Specialty Science
Date 2016 Dec 9
PMID 27930330
Citations 20
Authors
Affiliations
Soon will be listed here.
Abstract

Dimension reduction methods are commonly applied to high-throughput biological datasets. However, the results can be hindered by confounding factors, either biological or technical in origin. In this study, we extend principal component analysis (PCA) to propose AC-PCA for simultaneous dimension reduction and adjustment for confounding (AC) variation. We show that AC-PCA can adjust for (i) variations across individual donors present in a human brain exon array dataset and (ii) variations of different species in a model organism ENCODE RNA sequencing dataset. Our approach is able to recover the anatomical structure of neocortical regions and to capture the shared variation among species during embryonic development. For gene selection purposes, we extend AC-PCA with sparsity constraints and propose and implement an efficient algorithm. The methods developed in this paper can also be applied to more general settings. The R package and MATLAB source code are available at https://github.com/linzx06/AC-PCA.

Citing Articles

PARE: A framework for removal of confounding effects from any distance-based dimension reduction method.

Chen A, Clark K, Dewey B, DuVal A, Pellegrini N, Nair G PLoS Comput Biol. 2024; 20(7):e1012241.

PMID: 38985831 PMC: 11262650. DOI: 10.1371/journal.pcbi.1012241.


INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis.

Zhao K, Huang S, Lin C, Sham P, So H, Lin Z PLoS Genet. 2024; 20(3):e1011189.

PMID: 38484017 PMC: 10965063. DOI: 10.1371/journal.pgen.1011189.


Evaluating machine learning algorithms to Predict 30-day Unplanned REadmission (PURE) in Urology patients.

Welvaars K, van den Bekerom M, Doornberg J, van Haarst E BMC Med Inform Decis Mak. 2023; 23(1):108.

PMID: 37312177 PMC: 10262129. DOI: 10.1186/s12911-023-02200-9.


PLIN2-induced ectopic lipid accumulation promotes muscle ageing in gregarious locusts.

Guo S, Hou L, Dong L, Nie X, Kang L, Wang X Nat Ecol Evol. 2023; 7(6):914-926.

PMID: 37156891 DOI: 10.1038/s41559-023-02059-z.


Prediction of drug sensitivity based on multi-omics data using deep learning and similarity network fusion approaches.

Liu X, Mei X Front Bioeng Biotechnol. 2023; 11:1156372.

PMID: 37139048 PMC: 10150883. DOI: 10.3389/fbioe.2023.1156372.


References
1.
Das R, Dimitrova N, Xuan Z, Rollins R, Haghighi F, Edwards J . Computational prediction of methylation status in human genomic sequences. Proc Natl Acad Sci U S A. 2006; 103(28):10713-6. PMC: 1502297. DOI: 10.1073/pnas.0602949103. View

2.
Kang H, Kawasawa Y, Cheng F, Zhu Y, Xu X, Li M . Spatio-temporal transcriptome of the human brain. Nature. 2011; 478(7370):483-9. PMC: 3566780. DOI: 10.1038/nature10523. View

3.
Allen J, Davey H, Broadhurst D, Heald J, Rowland J, Oliver S . High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nat Biotechnol. 2003; 21(6):692-6. DOI: 10.1038/nbt823. View

4.
Risso D, Ngai J, Speed T, Dudoit S . Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol. 2014; 32(9):896-902. PMC: 4404308. DOI: 10.1038/nbt.2931. View

5.
Zhang R, Lin Y . DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2008; 37(Database issue):D455-8. PMC: 2686491. DOI: 10.1093/nar/gkn858. View