» Articles » PMID: 38975891

Characterizing Efficient Feature Selection for Single-cell Expression Analysis

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Jul 8
PMID 38975891
Authors
Affiliations
Soon will be listed here.
Abstract

Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.

Citing Articles

Deciphering gene expression patterns using large-scale transcriptomic data and its applications.

Chen S, Wang P, Guo H, Zhang Y Brief Bioinform. 2024; 25(6).

PMID: 39541191 PMC: 11562847. DOI: 10.1093/bib/bbae590.

References
1.
Zappia L, Phipson B, Oshlack A . Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174. PMC: 5596896. DOI: 10.1186/s13059-017-1305-0. View

2.
Townes F, Hicks S, Aryee M, Irizarry R . Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 2019; 20(1):295. PMC: 6927135. DOI: 10.1186/s13059-019-1861-6. View

3.
Hafemeister C, Satija R . Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 2019; 20(1):296. PMC: 6927181. DOI: 10.1186/s13059-019-1874-1. View

4.
Andrews T, Hemberg M . M3Drop: dropout-based feature selection for scRNASeq. Bioinformatics. 2018; 35(16):2865-2867. PMC: 6691329. DOI: 10.1093/bioinformatics/bty1044. View

5.
Van de Sande B, Lee J, Mutasa-Gottgens E, Naughton B, Bacon W, Manning J . Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov. 2023; 22(6):496-520. PMC: 10141847. DOI: 10.1038/s41573-023-00688-4. View