» Articles » PMID: 34381979

A Reference-free Approach for Cell Type Classification with ScRNA-seq

Overview
Journal iScience
Publisher Cell Press
Date 2021 Aug 12
PMID 34381979
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Single-cell RNA sequencing (scRNA-seq) has become a revolutionary technology to characterize cells under different biological conditions. Unlike bulk RNA-seq, gene expression from scRNA-seq is highly sparse due to limited sequencing depth per cell. This is worsened by tossing away a significant portion of reads that attribute to gene quantification. To overcome data sparsity and fully utilize original reads, we propose scSimClassify, a reference-free and alignment-free approach to classify cell types with -mer level features. The compressed -mer groups (CKGs), identified by the simhash method, contain -mers with similar abundance profiles and serve as the cells' features. Our experiments demonstrate that CKG features lend themselves to better performance than gene expression features in scRNA-seq classification accuracy in the majority of experimental cases. Because CKGs are derived from raw reads without alignment to reference genome, scSimClassify offers an effective alternative to existing methods especially when reference genome is incomplete or insufficient to represent subject genomes.

Citing Articles

Investigating biological nitrogen fixation via single-cell transcriptomics.

Pereira W, Conde D, Perron N, Schmidt H, Dervinis C, Venado R J Exp Bot. 2024; 76(4):931-949.

PMID: 39563004 PMC: 11850973. DOI: 10.1093/jxb/erae454.


Sex-biased gene expression at single-cell resolution: cause and consequence of sexual dimorphism.

Darolti I, Mank J Evol Lett. 2023; 7(3):148-156.

PMID: 37251587 PMC: 10210449. DOI: 10.1093/evlett/qrad013.


Statistical and Computational Methods for Proteogenomic Data Analysis.

Song X Methods Mol Biol. 2023; 2629:271-303.

PMID: 36929082 DOI: 10.1007/978-1-0716-2986-4_13.


BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis.

Firtina C, Park J, Alser M, Kim J, Cali D, Shahroodi T NAR Genom Bioinform. 2023; 5(1):lqad004.

PMID: 36685727 PMC: 9853099. DOI: 10.1093/nargab/lqad004.

References
1.
Kotecha S, Lebot M, Sukkarn B, Ball G, Moseley P, Chan S . Dopamine and cAMP-regulated phosphoprotein 32 kDa (DARPP-32) and survival in breast cancer: a retrospective analysis of protein and mRNA expression. Sci Rep. 2019; 9(1):16987. PMC: 6861271. DOI: 10.1038/s41598-019-53529-z. View

2.
Lee J, Park S, Jeong H, Ahn J, Choi S, Lee H . Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19. Sci Immunol. 2020; 5(49). PMC: 7402635. DOI: 10.1126/sciimmunol.abd1554. View

3.
Tan M, Yu D . Molecular mechanisms of erbB2-mediated breast cancer chemoresistance. Adv Exp Med Biol. 2007; 608:119-29. DOI: 10.1007/978-0-387-74039-3_9. View

4.
Wang Y, Fu L, Ren J, Yu Z, Chen T, Sun F . Identifying Sequences for Microbial Communities Using Long -mer Sequence Signatures. Front Microbiol. 2018; 9:872. PMC: 5943621. DOI: 10.3389/fmicb.2018.00872. View

5.
Andrews T, Hemberg M . Identifying cell populations with scRNASeq. Mol Aspects Med. 2017; 59:114-122. DOI: 10.1016/j.mam.2017.07.002. View