FDR-Corrected Sparse Canonical Correlation Analysis With Applications to Imaging Genomics
Overview
Affiliations
Reducing the number of false discoveries is presently one of the most pressing issues in the life sciences. It is of especially great importance for many applications in neuroimaging and genomics, where data sets are typically high-dimensional, which means that the number of explanatory variables exceeds the sample size. The false discovery rate (FDR) is a criterion that can be employed to address that issue. Thus it has gained great popularity as a tool for testing multiple hypotheses. Canonical correlation analysis (CCA) is a statistical technique that is used to make sense of the cross-correlation of two sets of measurements collected on the same set of samples (e.g., brain imaging and genomic data for the same mental illness patients), and sparse CCA extends the classical method to high-dimensional settings. Here, we propose a way of applying the FDR concept to sparse CCA, and a method to control the FDR. The proposed FDR correction directly influences the sparsity of the solution, adapting it to the unknown true sparsity level. Theoretical derivation as well as simulation studies show that our procedure indeed keeps the FDR of the canonical vectors below a user-specified target level. We apply the proposed method to an imaging genomics data set from the Philadelphia Neurodevelopmental Cohort. Our results link the brain connectivity profiles derived from brain activity during an emotion identification task, as measured by functional magnetic resonance imaging, to the corresponding subjects' genomic data.
Semisynthetic simulation for microbiome data analysis.
Sankaran K, Kodikara S, Li J, Le Cao K Brief Bioinform. 2025; 26(1).
PMID: 39927858 PMC: 11808806. DOI: 10.1093/bib/bbaf051.
Potential drug targets for tumors identified through Mendelian randomization analysis.
Song N, Shi P, Cui K, Zeng L, Wang Z, Di W Sci Rep. 2024; 14(1):11370.
PMID: 38762700 PMC: 11102463. DOI: 10.1038/s41598-024-62178-w.
Multi-Group Tensor Canonical Correlation Analysis.
Zhou Z, Tong B, Tarzanagh D, Hou B, Saykin A, Long Q ACM BCB. 2023; 2023.
PMID: 37876849 PMC: 10593155. DOI: 10.1145/3584371.3612962.
Longitudinal Canonical Correlation Analysis.
Lee S, Choi J, Fang Z, Bowman F J R Stat Soc Ser C Appl Stat. 2023; 72(3):587-607.
PMID: 37431451 PMC: 10332816. DOI: 10.1093/jrsssc/qlad022.
sJIVE: Supervised Joint and Individual Variation Explained.
Palzer E, Wendt C, Bowler R, Hersh C, Safo S, Lock E Comput Stat Data Anal. 2022; 175.
PMID: 36119152 PMC: 9481062. DOI: 10.1016/j.csda.2022.107547.