» Articles » PMID: 36149937

Machine Learning for Cell Type Classification from Single Nucleus RNA Sequencing Data

Overview
Journal PLoS One
Date 2022 Sep 23
PMID 36149937
Authors
Affiliations
Soon will be listed here.
Abstract

With the advent of single cell/nucleus RNA sequencing (sc/snRNA-seq), the field of cell phenotyping is now a data-driven exercise providing statistical evidence to support cell type/state categorization. However, the task of classifying cells into specific, well-defined categories with the empirical data provided by sc/snRNA-seq remains nontrivial due to the difficulty in determining specific differences between related cell types with close transcriptional similarities, resulting in challenges with matching cell types identified in separate experiments. To investigate possible approaches to overcome these obstacles, we explored the use of supervised machine learning methods-logistic regression, support vector machines, random forests, neural networks, and light gradient boosting machine (LightGBM)-as approaches to classify cell types using snRNA-seq datasets from human brain middle temporal gyrus (MTG) and human kidney. Classification accuracy was evaluated using an F-beta score weighted in favor of precision to account for technical artifacts of gene expression dropout. We examined the impact of hyperparameter optimization and feature selection methods on F-beta score performance. We found that the best performing model for granular cell type classification in both datasets is a multinomial logistic regression classifier and that an effective feature selection step was the most influential factor in optimizing the performance of the machine learning pipelines.

Citing Articles

AnnoGCD: a generalized category discovery framework for automatic cell type annotation.

Ceccarelli F, Lio P, Holden S NAR Genom Bioinform. 2024; 6(4):lqae166.

PMID: 39660254 PMC: 11629990. DOI: 10.1093/nargab/lqae166.


Identification of kidney cell types in scRNA-seq and snRNA-seq data using machine learning algorithms.

Tisch A, Madapoosi S, Blough S, Rosa J, Eddy S, Mariani L Heliyon. 2024; 10(19):e38567.

PMID: 39403515 PMC: 11471582. DOI: 10.1016/j.heliyon.2024.e38567.


scGAA: a general gated axial-attention model for accurate cell-type annotation of single-cell RNA-seq data.

Kong T, Yu T, Zhao J, Hu Z, Xiong N, Wan J Sci Rep. 2024; 14(1):22308.

PMID: 39333739 PMC: 11436728. DOI: 10.1038/s41598-024-73356-1.


Predictive biomarkers for embryotoxicity: a machine learning approach to mitigating multicollinearity in RNA-Seq.

Quah Y, Jung S, Chan J, Ham O, Jeong J, Kim S Arch Toxicol. 2024; 98(12):4093-4105.

PMID: 39242367 DOI: 10.1007/s00204-024-03852-w.


AITeQ: a machine learning framework for Alzheimer's prediction using a distinctive five-gene signature.

Ahammad I, Lamisa A, Bhattacharjee A, Jamal T, Arefin M, Chowdhury Z Brief Bioinform. 2024; 25(4).

PMID: 38877887 PMC: 11179120. DOI: 10.1093/bib/bbae291.


References
1.
Kharchenko P . The triumphs and limitations of computational methods for scRNA-seq. Nat Methods. 2021; 18(7):723-732. DOI: 10.1038/s41592-021-01171-x. View

2.
Aevermann B, Novotny M, Bakken T, Miller J, Diehl A, Osumi-Sutherland D . Cell type discovery using single-cell transcriptomics: implications for ontological representation. Hum Mol Genet. 2018; 27(R1):R40-R47. PMC: 5946857. DOI: 10.1093/hmg/ddy100. View

3.
Boldog E, Bakken T, Hodge R, Novotny M, Aevermann B, Baka J . Transcriptomic and morphophysiological evidence for a specialized human cortical GABAergic cell type. Nat Neurosci. 2018; 21(9):1185-1195. PMC: 6130849. DOI: 10.1038/s41593-018-0205-2. View

4.
Aevermann B, Zhang Y, Novotny M, Keshk M, Bakken T, Miller J . A machine learning method for the discovery of minimum marker gene combinations for cell type identification from single-cell RNA sequencing. Genome Res. 2021; 31(10):1767-1780. PMC: 8494219. DOI: 10.1101/gr.275569.121. View

5.
Lake B, Menon R, Winfree S, Hu Q, Ferreira R, Kalhor K . An atlas of healthy and injured cell states and niches in the human kidney. Nature. 2023; 619(7970):585-594. PMC: 10356613. DOI: 10.1038/s41586-023-05769-3. View