» Articles » PMID: 39948393

ScCobra Allows Contrastive Cell Embedding Learning with Domain Adaptation for Single Cell Data Integration and Harmonization

Overview
Journal Commun Biol
Specialty Biology
Date 2025 Feb 13
PMID 39948393
Authors
Affiliations
Soon will be listed here.
Abstract

The rapid advancement of single-cell technologies has created an urgent need for effective methods to integrate and harmonize single-cell data. Technical and biological variations across studies complicate data integration, while conventional tools often struggle with reliance on gene expression distribution assumptions and over-correction. Here, we present scCobra, a deep generative neural network designed to overcome these challenges through contrastive learning with domain adaptation. scCobra effectively mitigates batch effects, minimizes over-correction, and ensures biologically meaningful data integration without assuming specific gene expression distributions. It enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining. Additionally, scCobra supports batch effect simulation, advanced multi-omic integration, and scalable processing of large datasets. By integrating and harmonizing datasets from similar studies, scCobra expands the available data for investigating specific biological problems, improving cross-study comparability, and revealing insights that may be obscured in isolated datasets.

References
1.
Luecken M, Buttner M, Chaichoompu K, Danese A, Interlandi M, Mueller M . Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2021; 19(1):41-50. PMC: 8748196. DOI: 10.1038/s41592-021-01336-8. View

2.
Picelli S . Full-Length Single-Cell RNA Sequencing with Smart-seq2. Methods Mol Biol. 2019; 1979:25-44. DOI: 10.1007/978-1-4939-9240-9_3. View

3.
Oetjen K, Lindblad K, Goswami M, Gui G, Dagur P, Lai C . Human bone marrow assessment by single-cell RNA sequencing, mass cytometry, and flow cytometry. JCI Insight. 2018; 3(23). PMC: 6328018. DOI: 10.1172/jci.insight.124928. View

4.
Cao Z, Gao G . Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat Biotechnol. 2022; 40(10):1458-1466. PMC: 9546775. DOI: 10.1038/s41587-022-01284-4. View

5.
See P, Lum J, Chen J, Ginhoux F . A Single-Cell Sequencing Guide for Immunologists. Front Immunol. 2018; 9:2425. PMC: 6205970. DOI: 10.3389/fimmu.2018.02425. View