» Articles » PMID: 35944930

Resolution of the Curse of Dimensionality in Single-cell RNA Sequencing Data Analysis

Abstract

Single-cell RNA sequencing (scRNA-seq) can determine gene expression in numerous individual cells simultaneously, promoting progress in the biomedical sciences. However, scRNA-seq data are high-dimensional with substantial technical noise, including dropouts. During analysis of scRNA-seq data, such noise engenders a statistical problem known as the curse of dimensionality (COD). Based on high-dimensional statistics, we herein formulate a noise reduction method, RECODE (resolution of the curse of dimensionality), for high-dimensional data with random sampling noise. We show that RECODE consistently resolves COD in relevant scRNA-seq data with unique molecular identifiers. RECODE does not involve dimension reduction and recovers expression values for all genes, including lowly expressed genes, realizing precise delineation of cell fate transitions and identification of rare cells with all gene information. Compared with representative imputation methods, RECODE employs different principles and exhibits superior overall performance in cell-clustering, expression value recovery, and single-cell-level analysis. The RECODE algorithm is parameter-free, data-driven, deterministic, and high-speed, and its applicability can be predicted based on the variance normalization performance. We propose RECODE as a powerful strategy for preprocessing noisy high-dimensional data.

Citing Articles

Artificial Intelligence and Neuroscience: Transformative Synergies in Brain Research and Clinical Applications.

Onciul R, Tataru C, Dumitru A, Crivoi C, Serban M, Covache-Busuioc R J Clin Med. 2025; 14(2).

PMID: 39860555 PMC: 11766073. DOI: 10.3390/jcm14020550.


From multi-omics to predictive biomarker: AI in tumor microenvironment.

Hai L, Jiang Z, Zhang H, Sun Y Front Immunol. 2025; 15:1514977.

PMID: 39763649 PMC: 11701166. DOI: 10.3389/fimmu.2024.1514977.


scEGOT: single-cell trajectory inference framework based on entropic Gaussian mixture optimal transport.

Yachimura T, Wang H, Imoto Y, Yoshida M, Tasaki S, Kojima Y BMC Bioinformatics. 2024; 25(1):388.

PMID: 39710672 PMC: 11665215. DOI: 10.1186/s12859-024-05988-z.


Fine construction of gene coexpression network analysis using GTOM and RECODE detected a critical module of neuroblastoma stages 4 and 4S.

Nakamura F, Nakano Y, Yamada S Hereditas. 2024; 161(1):44.

PMID: 39538286 PMC: 11562103. DOI: 10.1186/s41065-024-00342-y.


Integrated multi-omics with machine learning to uncover the intricacies of kidney disease.

Liu X, Shi J, Jiao Y, An J, Tian J, Yang Y Brief Bioinform. 2024; 25(5).

PMID: 39082652 PMC: 11289682. DOI: 10.1093/bib/bbae364.


References
1.
Eraslan G, Simon L, Mircea M, Mueller N, Theis F . Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun. 2019; 10(1):390. PMC: 6344535. DOI: 10.1038/s41467-018-07931-2. View

2.
Nakamura T, Okamoto I, Sasaki K, Yabuta Y, Iwatani C, Tsuchiya H . A developmental coordinate of pluripotency among mice, monkeys and humans. Nature. 2016; 537(7618):57-62. DOI: 10.1038/nature19096. View

3.
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay P, Swerdlow H . Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017; 14(9):865-868. PMC: 5669064. DOI: 10.1038/nmeth.4380. View

4.
Hao Y, Hao S, Andersen-Nissen E, Mauck 3rd W, Zheng S, Butler A . Integrated analysis of multimodal single-cell data. Cell. 2021; 184(13):3573-3587.e29. PMC: 8238499. DOI: 10.1016/j.cell.2021.04.048. View

5.
Kinzel D, Boldt K, Davis E, Burtscher I, Trumbach D, Diplas B . Pitchfork regulates primary cilia disassembly and left-right asymmetry. Dev Cell. 2010; 19(1):66-77. PMC: 3671612. DOI: 10.1016/j.devcel.2010.06.005. View