Coupled Co-clustering-based Unsupervised Transfer Learning for the Integrative Analysis of Single-cell Genomic Data
Overview
Authors
Affiliations
Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.
Single-cell omics: experimental workflow, data analyses and applications.
Sun F, Li H, Sun D, Fu S, Gu L, Shao X Sci China Life Sci. 2024; 68(1):5-102.
PMID: 39060615 DOI: 10.1007/s11427-023-2561-0.
Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings.
LeRoy N, Smith J, Zheng G, Rymuza J, Gharavi E, Brown D NAR Genom Bioinform. 2024; 6(3):lqae073.
PMID: 38974799 PMC: 11224678. DOI: 10.1093/nargab/lqae073.
Li R, Shi F, Song L, Yu Z BMC Genomics. 2024; 25(1):393.
PMID: 38649804 PMC: 11034052. DOI: 10.1186/s12864-024-10319-w.
Ren J, Lyu X, Guo J, Shi X, Zhou Y, Li Q J Transl Med. 2024; 22(1):233.
PMID: 38433205 PMC: 10910752. DOI: 10.1186/s12967-024-05009-w.
iPoLNG-An unsupervised model for the integrative analysis of single-cell multiomics data.
Zhang W, Lin Z Front Genet. 2023; 14:998504.
PMID: 36865385 PMC: 9972291. DOI: 10.3389/fgene.2023.998504.