» Articles » PMID: 40082610

Feature Selection Methods Affect the Performance of ScRNA-seq Data Integration and Querying

Overview
Journal Nat Methods
Date 2025 Mar 14
PMID 40082610
Authors
Affiliations
Soon will be listed here.
Abstract

The availability of single-cell transcriptomics has allowed the construction of reference cell atlases, but their usefulness depends on the quality of dataset integration and the ability to map new samples. Previous benchmarks have compared integration methods and suggest that feature selection improves performance but have not explored how best to select features. Here, we benchmark feature selection methods for single-cell RNA sequencing integration using metrics beyond batch correction and preservation of biological variation to assess query mapping, label transfer and the detection of unseen populations. We reinforce common practice by showing that highly variable feature selection is effective for producing high-quality integrations and provide further guidance on the effect of the number of features selected, batch-aware feature selection, lineage-specific feature selection and integration and the interaction between feature selection and integration models. These results are informative for analysts working on large-scale tissue atlases, using atlases or integrating their own data to tackle specific biological questions.

References
1.
Zappia L, Phipson B, Oshlack A . Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLoS Comput Biol. 2018; 14(6):e1006245. PMC: 6034903. DOI: 10.1371/journal.pcbi.1006245. View

2.
Tran H, Ang K, Chevrier M, Zhang X, Lee N, Goh M . A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol. 2020; 21(1):12. PMC: 6964114. DOI: 10.1186/s13059-019-1850-9. View

3.
Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy D, Alvarez-Varela A . Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nat Biotechnol. 2020; 38(6):747-755. DOI: 10.1038/s41587-020-0469-4. View

4.
Chazarra-Gil R, van Dongen S, Kiselev V, Hemberg M . Flexible comparison of batch correction methods for single-cell RNA-seq using BatchBench. Nucleic Acids Res. 2021; 49(7):e42. PMC: 8053088. DOI: 10.1093/nar/gkab004. View

5.
Wolf F, Angerer P, Theis F . SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018; 19(1):15. PMC: 5802054. DOI: 10.1186/s13059-017-1382-0. View