SwarmMAP: Swarm Learning for Decentralized Cell Type Annotation in Single Cell Sequencing Data

Overview

Journal bioRxiv

Date 2025 Jan 27

PMID 39868099

Authors

Oliver Lester Saldanha

Vivien Goepp

Kevin Pfeiffer

Hyojin Kim

Jie Fu Zhu

Rafael Kramann

Sikander Hayat

Jakob Nikolas Kather

Affiliations

Soon will be listed here.

Abstract

Rapid technological advancements have made it possible to generate single-cell data at a large scale. Several laboratories around the world can now generate single-cell transcriptomic data from different tissues. Unsupervised clustering, followed by annotation of the cell type of the identified clusters, is a crucial step in single-cell analyses. However, there is no consensus on the marker genes to use for annotation, and cell-type annotation is currently mostly done by manual inspection of marker genes, which is irreproducible, and poorly scalable. Additionally, patient-privacy is also a critical issue with human datasets. There is a critical need to standardize and automate cell-type annotation across datasets in a privacy-preserving manner. Here, we developed SwarmMAP that uses Swarm Learning to train machine learning models for cell-type classification based on single-cell sequencing data in a decentralized way. SwarmMAP does not require any exchange of raw data between data centers. SwarmMAP has a F1-score of 0.93, 0.98, and 0.88 for cell type classification in human heart, lung, and breast datasets, respectively. Swarm Learning-based models yield an average performance of which is on par with the performance achieved by models trained on centralized data (-val=, Mann-Whitney Test). We also find that increasing the number of datasets increases cell-type prediction accuracy and enables handling higher cell-type diversity. Together, these findings demonstrate that Swarm Learning is a viable approach to automate cell-type annotation. SwarmMAP is available at https://github.com/hayatlab/SwarmMAP.

References

Travaglini K, Nabhan A, Penland L, Sinha R, Gillich A, Sit R . A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature. 2020; 587(7835):619-625. PMC: 7704697. DOI: 10.1038/s41586-020-2922-4. View

Chaffin M, Papangeli I, Simonson B, Akkad A, Hill M, Arduini A . Single-nucleus profiling of human dilated and hypertrophic cardiomyopathy. Nature. 2022; 608(7921):174-180. DOI: 10.1038/s41586-022-04817-8. View

Heryanto Y, Zhang Y, Imoto S . Predicting cell types with supervised contrastive learning on cells and their types. Sci Rep. 2024; 14(1):430. PMC: 10764802. DOI: 10.1038/s41598-023-50185-2. View

Stephenson E, Reynolds G, Botting R, Calero-Nieto F, Morgan M, Tuong Z . Single-cell multi-omics analysis of the immune response in COVID-19. Nat Med. 2021; 27(5):904-916. PMC: 8121667. DOI: 10.1038/s41591-021-01329-2. View

Sikkema L, Ramirez-Suastegui C, Strobl D, Gillett T, Zappia L, Madissoon E . An integrated cell atlas of the lung in health and disease. Nat Med. 2023; 29(6):1563-1577. PMC: 10287567. DOI: 10.1038/s41591-023-02327-2. View

Twigger A, Engelbrecht L, Bach K, Schultz-Pernice I, Pensa S, Stenning J . Transcriptional changes in the mammary gland during lactation revealed by single cell sequencing of cells from human milk. Nat Commun. 2022; 13(1):562. PMC: 8799659. DOI: 10.1038/s41467-021-27895-0. View

Walker C, Li X, Chakravarthy M, Lounsbery-Scaife W, Choi Y, Singh R . Private information leakage from single-cell count matrices. Cell. 2024; 187(23):6537-6549.e10. PMC: 11568916. DOI: 10.1016/j.cell.2024.09.012. View

Pal B, Chen Y, Vaillant F, Capaldo B, Joyce R, Song X . A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J. 2021; 40(11):e107333. PMC: 8167363. DOI: 10.15252/embj.2020107333. View

Nee K, Ma D, Nguyen Q, Pein M, Pervolarakis N, Insua-Rodriguez J . Preneoplastic stromal cells promote BRCA1-mediated breast tumorigenesis. Nat Genet. 2023; 55(4):595-606. PMC: 10655552. DOI: 10.1038/s41588-023-01298-x. View

10.

Liu Y, Wu F, Tian R, Shi Y, Xu Z, Liu J . The bHLH-zip transcription factor SREBP regulates triterpenoid and lipid metabolisms in the medicinal fungus Ganoderma lingzhi. Commun Biol. 2023; 6(1):1. PMC: 9810662. DOI: 10.1038/s42003-022-04154-6. View

10.

Becht E, McInnes L, Healy J, Dutertre C, Kwok I, Ng L . Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2018; . DOI: 10.1038/nbt.4314. View

11.

Saldanha O, Quirke P, West N, James J, Loughrey M, Grabsch H . Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat Med. 2022; 28(6):1232-1239. PMC: 9205774. DOI: 10.1038/s41591-022-01768-5. View

12.

Misharin A, Budinger G . Targeting the Myofibroblast in Pulmonary Fibrosis. Am J Respir Crit Care Med. 2018; 198(7):834-835. DOI: 10.1164/rccm.201806-1037ED. View

13.

Warnat-Herresthal S, Schultze H, Shastry K, Manamohan S, Mukherjee S, Garg V . Swarm Learning for decentralized and confidential clinical machine learning. Nature. 2021; 594(7862):265-270. PMC: 8189907. DOI: 10.1038/s41586-021-03583-3. View

14.

Khaliq A, Erdogan C, Kurt Z, Turgut S, Grunvald M, Rand T . Refining colorectal cancer classification and clinical stratification through a single-cell atlas. Genome Biol. 2022; 23(1):113. PMC: 9092724. DOI: 10.1186/s13059-022-02677-z. View

15.

Kang J, Nathan A, Weinand K, Zhang F, Millard N, Rumker L . Efficient and precise single-cell reference atlas mapping with Symphony. Nat Commun. 2021; 12(1):5890. PMC: 8497570. DOI: 10.1038/s41467-021-25957-x. View

16.

Fischer F, Fischer D, Mukhin R, Isaev A, Biederstedt E, Villani A . scTab: Scaling cross-tissue single-cell annotation models. Nat Commun. 2024; 15(1):6611. PMC: 11298532. DOI: 10.1038/s41467-024-51059-5. View

17.

Kuppe C, Ibrahim M, Kranz J, Zhang X, Ziegler S, Perales-Paton J . Decoding myofibroblast origins in human kidney fibrosis. Nature. 2020; 589(7841):281-286. PMC: 7611626. DOI: 10.1038/s41586-020-2941-1. View

18.

Xu Y, Baumgart S, Stegmann C, Hayat S . MACA: marker-based automatic cell-type annotation for single-cell expression data. Bioinformatics. 2021; 38(6):1756-1760. DOI: 10.1093/bioinformatics/btab840. View

19.

Ma F, Pellegrini M . ACTINN: automated identification of cell types in single cell RNA sequencing. Bioinformatics. 2019; 36(2):533-538. DOI: 10.1093/bioinformatics/btz592. View