» Articles » PMID: 39974925

A Benchmarking Study of Random Projections and Principal Components for Dimensionality Reduction Strategies in Single Cell Analysis

Overview
Journal bioRxiv
Date 2025 Feb 20
PMID 39974925
Authors
Affiliations
Soon will be listed here.
Abstract

Principal Component Analysis (PCA) has long been a cornerstone in dimensionality reduction for high-dimensional data, including single-cell RNA sequencing (scRNA-seq). However, PCA's performance typically degrades with increasing data size, can be sensitive to outliers, and assumes linearity. Recently, Random Projection (RP) methods have emerged as promising alternatives, addressing some of these limitations. This study systematically and comprehensively evaluates PCA and RP approaches, including Singular Value Decomposition (SVD) and randomized SVD, alongside Sparse and Gaussian Random Projection algorithms, with a focus on computational efficiency and downstream analysis effectiveness. We benchmark performance using multiple scRNA-seq datasets including labeled and unlabeled publicly available datasets. We apply Hierarchical Clustering and Spherical K-Means clustering algorithms to assess downstream clustering quality. For labeled datasets, clustering accuracy is measured using the Hungarian algorithm and Mutual Information. For unlabeled datasets, the Dunn Index and Gap Statistic capture cluster separation. Across both dataset types, the Within-Cluster Sum of Squares (WCSS) metric is used to assess variability. Additionally, locality preservation is examined, with RP outperforming PCA in several of the evaluated metrics. Our results demonstrate that RP not only surpasses PCA in computational speed but also rivals and, in some cases, exceeds PCA in preserving data variability and clustering quality. By providing a thorough benchmarking of PCA and RP methods, this work offers valuable insights into selecting optimal dimensionality reduction techniques, balancing computational performance, scalability, and the quality of downstream analyses.

References
1.
Zappia L, Phipson B, Oshlack A . Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174. PMC: 5596896. DOI: 10.1186/s13059-017-1305-0. View

2.
Liao M, Liu Y, Yuan J, Wen Y, Xu G, Zhao J . Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19. Nat Med. 2020; 26(6):842-844. DOI: 10.1038/s41591-020-0901-9. View

3.
Saliba A, Westermann A, Gorski S, Vogel J . Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014; 42(14):8845-60. PMC: 4132710. DOI: 10.1093/nar/gku555. View

4.
Andrews T, Hemberg M . Identifying cell populations with scRNASeq. Mol Aspects Med. 2017; 59:114-122. DOI: 10.1016/j.mam.2017.07.002. View

5.
Horning A, Wang Y, Lin C, Louie A, Jadhav R, Hung C . Single-Cell RNA-seq Reveals a Subpopulation of Prostate Cancer Cells with Enhanced Cell-Cycle-Related Transcription and Attenuated Androgen Response. Cancer Res. 2017; 78(4):853-864. PMC: 5983359. DOI: 10.1158/0008-5472.CAN-17-1924. View