SwarmMAP: Swarm Learning for Decentralized Cell Type Annotation in Single Cell Sequencing Data
Overview
Authors
Affiliations
Rapid technological advancements have made it possible to generate single-cell data at a large scale. Several laboratories around the world can now generate single-cell transcriptomic data from different tissues. Unsupervised clustering, followed by annotation of the cell type of the identified clusters, is a crucial step in single-cell analyses. However, there is no consensus on the marker genes to use for annotation, and cell-type annotation is currently mostly done by manual inspection of marker genes, which is irreproducible, and poorly scalable. Additionally, patient-privacy is also a critical issue with human datasets. There is a critical need to standardize and automate cell-type annotation across datasets in a privacy-preserving manner. Here, we developed SwarmMAP that uses Swarm Learning to train machine learning models for cell-type classification based on single-cell sequencing data in a decentralized way. SwarmMAP does not require any exchange of raw data between data centers. SwarmMAP has a F1-score of 0.93, 0.98, and 0.88 for cell type classification in human heart, lung, and breast datasets, respectively. Swarm Learning-based models yield an average performance of which is on par with the performance achieved by models trained on centralized data (-val=, Mann-Whitney Test). We also find that increasing the number of datasets increases cell-type prediction accuracy and enables handling higher cell-type diversity. Together, these findings demonstrate that Swarm Learning is a viable approach to automate cell-type annotation. SwarmMAP is available at https://github.com/hayatlab/SwarmMAP.