» Articles » PMID: 37671013

An Introduction to Representation Learning for Single-cell Data Analysis

Overview
Specialty Cell Biology
Date 2023 Sep 6
PMID 37671013
Authors
Affiliations
Soon will be listed here.
Abstract

Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.

Citing Articles

Addressing scalability and managing sparsity and dropout events in single-cell representation identification with ZIGACL.

Shi M, Li X Brief Bioinform. 2025; 26(1.

PMID: 39775477 PMC: 11705091. DOI: 10.1093/bib/bbae703.


Application of a novel numerical simulation to biochemical reaction systems.

Sato T Front Cell Dev Biol. 2024; 12:1351974.

PMID: 39310225 PMC: 11412882. DOI: 10.3389/fcell.2024.1351974.


Advances in AI and machine learning for predictive medicine.

Sharma A, Lysenko A, Jia S, Boroevich K, Tsunoda T J Hum Genet. 2024; 69(10):487-497.

PMID: 38424184 PMC: 11422165. DOI: 10.1038/s10038-024-01231-y.

References
1.
Iuchi H, Matsutani T, Yamada K, Iwano N, Sumi S, Hosoda S . Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J. 2021; 19:3198-3208. PMC: 8190442. DOI: 10.1016/j.csbj.2021.05.039. View

2.
Hyvarinen A . Independent component analysis: recent advances. Philos Trans A Math Phys Eng Sci. 2013; 371(1984):20110534. PMC: 3538438. DOI: 10.1098/rsta.2011.0534. View

3.
Segal E, Shapira M, Regev A, Peer D, Botstein D, Koller D . Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003; 34(2):166-76. DOI: 10.1038/ng1165. View

4.
Wang D, Gu J . VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder. Genomics Proteomics Bioinformatics. 2018; 16(5):320-331. PMC: 6364131. DOI: 10.1016/j.gpb.2018.08.003. View

5.
Karacosta L . From imaging a single cell to implementing precision medicine: an exciting new era. Emerg Top Life Sci. 2021; 5(6):837-847. PMC: 8786301. DOI: 10.1042/ETLS20210219. View