» Articles » PMID: 33279962

Coupled Co-clustering-based Unsupervised Transfer Learning for the Integrative Analysis of Single-cell Genomic Data

Overview
Journal Brief Bioinform
Specialty Biology
Date 2020 Dec 6
PMID 33279962
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Unsupervised methods, such as clustering methods, are essential to the analysis of single-cell genomic data. The most current clustering methods are designed for one data type only, such as single-cell RNA sequencing (scRNA-seq), single-cell ATAC sequencing (scATAC-seq) or sc-methylation data alone, and a few are developed for the integrative analysis of multiple data types. The integrative analysis of multimodal single-cell genomic data sets leverages the power in multiple data sets and can deepen the biological insight. In this paper, we propose a coupled co-clustering-based unsupervised transfer learning algorithm (coupleCoC) for the integrative analysis of multimodal single-cell data. Our proposed coupleCoC builds upon the information theoretic co-clustering framework. In co-clustering, both the cells and the genomic features are simultaneously clustered. Clustering similar genomic features reduces the noise in single-cell data and facilitates transfer of knowledge across single-cell datasets. We applied coupleCoC for the integrative analysis of scATAC-seq and scRNA-seq data, sc-methylation and scRNA-seq data and scRNA-seq data from mouse and human. We demonstrate that coupleCoC improves the overall clustering performance and matches the cell subpopulations across multimodal single-cell genomic datasets. Our method coupleCoC is also computationally efficient and can scale up to large datasets. Availability: The software and datasets are available at https://github.com/cuhklinlab/coupleCoC.

Citing Articles

Single-cell omics: experimental workflow, data analyses and applications.

Sun F, Li H, Sun D, Fu S, Gu L, Shao X Sci China Life Sci. 2024; 68(1):5-102.

PMID: 39060615 DOI: 10.1007/s11427-023-2561-0.


Fast clustering and cell-type annotation of scATAC data using pre-trained embeddings.

LeRoy N, Smith J, Zheng G, Rymuza J, Gharavi E, Brown D NAR Genom Bioinform. 2024; 6(3):lqae073.

PMID: 38974799 PMC: 11224678. DOI: 10.1093/nargab/lqae073.


scGAL: unmask tumor clonal substructure by jointly analyzing independent single-cell copy number and scRNA-seq data.

Li R, Shi F, Song L, Yu Z BMC Genomics. 2024; 25(1):393.

PMID: 38649804 PMC: 11034052. DOI: 10.1186/s12864-024-10319-w.


CDSKNN: a novel clustering framework for large-scale single-cell data based on a stable graph structure.

Ren J, Lyu X, Guo J, Shi X, Zhou Y, Li Q J Transl Med. 2024; 22(1):233.

PMID: 38433205 PMC: 10910752. DOI: 10.1186/s12967-024-05009-w.


iPoLNG-An unsupervised model for the integrative analysis of single-cell multiomics data.

Zhang W, Lin Z Front Genet. 2023; 14:998504.

PMID: 36865385 PMC: 9972291. DOI: 10.3389/fgene.2023.998504.