» Articles » PMID: 37287536

Subject Clustering by IF-PCA and Several Recent Methods

Overview
Journal Front Genet
Date 2023 Jun 8
PMID 37287536
Authors
Affiliations
Soon will be listed here.
Abstract

Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear).

References
1.
Chang J, Zhou W, Zhou W, Wang L . Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Biometrics. 2016; 73(1):31-41. DOI: 10.1111/biom.12552. View

2.
Barnett I, Mukherjee R, Lin X . The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies. J Am Stat Assoc. 2017; 112(517):64-76. PMC: 5517103. DOI: 10.1080/01621459.2016.1192039. View

3.
Grun D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N . Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015; 525(7568):251-5. DOI: 10.1038/nature14966. View

4.
Fan J, Fan Y, Han X, Lv J . Asymptotic Theory of Eigenvectors for Random Matrices with Diverging Spikes. J Am Stat Assoc. 2022; 117(538):996-1009. PMC: 9438751. DOI: 10.1080/01621459.2020.1840990. View

5.
Hao Y, Hao S, Andersen-Nissen E, Mauck 3rd W, Zheng S, Butler A . Integrated analysis of multimodal single-cell data. Cell. 2021; 184(13):3573-3587.e29. PMC: 8238499. DOI: 10.1016/j.cell.2021.04.048. View