Multi-omic and Multi-view Clustering Algorithms: Review and Cancer Benchmark

Overview

Journal Nucleic Acids Res

Publisher Oxford University Press

Specialty Biochemistry

Date 2018 Oct 9

PMID 30295871

Citations 166

Authors

Nimrod Rappoport

Ron Shamir

Affiliations

Soon will be listed here.

Abstract

Recent high throughput experimental methods have been used to collect large biomedical omics datasets. Clustering of single omic datasets has proven invaluable for biological and medical research. The decreasing cost and development of additional high throughput methods now enable measurement of multi-omic data. Clustering multi-omic data has the potential to reveal further systems-level insights, but raises computational and biological challenges. Here, we review algorithms for multi-omics clustering, and discuss key issues in applying these algorithms. Our review covers methods developed specifically for omic data as well as generic multi-view methods developed in the machine learning community for joint clustering of multiple data types. In addition, using cancer data from TCGA, we perform an extensive benchmark spanning ten different cancer types, providing the first systematic comparison of leading multi-omics and multi-view clustering algorithms. The results highlight key issues regarding the use of single- versus multi-omics, the choice of clustering strategy, the power of generic multi-view methods and the use of approximated p-values for gauging solution quality. Due to the growing use of multi-omics data, we expect these issues to be important for future progress in the field.

Citing Articles

Advancements in proteogenomics for preclinical targeted cancer therapy research.

Suo Y, Song Y, Wang Y, Liu Q, Rodriguez H, Zhou H Biophys Rep. 2025; 11(1):56-76.

PMID: 40070661 PMC: 11891078. DOI: 10.52601/bpr.2024.240053.

An Attention-Aware Multi-Task Learning Framework Identifies Candidate Targets for Drug Repurposing in Sarcopenia.

Reza M, Qiu C, Lin X, Su K, Liu A, Zhang X J Cachexia Sarcopenia Muscle. 2025; 16(2):e13661.

PMID: 40045692 PMC: 11883102. DOI: 10.1002/jcsm.13661.

Integrative Analysis of Metabolome and Proteome in the Cerebrospinal Fluid of Patients with Multiple System Atrophy.

George N, Kwon M, Jang Y, Kim S, Hwang J, Lee S Cells. 2025; 14(4).

PMID: 39996738 PMC: 11853536. DOI: 10.3390/cells14040265.

Mapping the knowledge of omics in myocardial infarction: A scientometric analysis in R Studio, VOSviewer, Citespace, and SciMAT.

Wei X, Wang M, Yu S, Han Z, Li C, Zhong Y Medicine (Baltimore). 2025; 104(7):e41368.

PMID: 39960900 PMC: 11835070. DOI: 10.1097/MD.0000000000041368.

Feature graphs for interpretable unsupervised tree ensembles: centrality, interaction, and application in disease subtyping.

Sirocchi C, Urschler M, Pfeifer B BioData Min. 2025; 18(1):15.

PMID: 39955586 PMC: 11829558. DOI: 10.1186/s13040-025-00430-3.

References

Lock E, Hoadley K, Marron J, Nobel A . JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. Ann Appl Stat. 2013; 7(1):523-542. PMC: 3671601. DOI: 10.1214/12-AOAS597. View

Lin D, Zhang J, Li J, Calhoun V, Deng H, Wang Y . Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics. 2013; 14:245. PMC: 3751310. DOI: 10.1186/1471-2105-14-245. View

Chen J, Bushman F, Lewis J, Wu G, Li H . Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics. 2012; 14(2):244-58. PMC: 3590923. DOI: 10.1093/biostatistics/kxs038. View

Gabasova E, Reid J, Wernisch L . Clusternomics: Integrative context-dependent clustering for heterogeneous datasets. PLoS Comput Biol. 2017; 13(10):e1005781. PMC: 5658176. DOI: 10.1371/journal.pcbi.1005781. View

Vandin F, Papoutsaki A, Raphael B, Upfal E . Accurate computation of survival statistics in genome-wide studies. PLoS Comput Biol. 2015; 11(5):e1004071. PMC: 4423942. DOI: 10.1371/journal.pcbi.1004071. View

Allison D, Cui X, Page G, Sabripour M . Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2005; 7(1):55-65. DOI: 10.1038/nrg1749. View

Huang S, Chaudhary K, Garmire L . More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet. 2017; 8:84. PMC: 5472696. DOI: 10.3389/fgene.2017.00084. View

Le Cao K, Martin P, Robert-Granie C, Besse P . Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics. 2009; 10:34. PMC: 2640358. DOI: 10.1186/1471-2105-10-34. View

Hoadley K, Yau C, Hinoue T, Wolf D, Lazar A, Drill E . Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell. 2018; 173(2):291-304.e6. PMC: 5957518. DOI: 10.1016/j.cell.2018.03.022. View

10.

Savage R, Ghahramani Z, Griffin J, de la Cruz B, Wild D . Discovering transcriptional modules by Bayesian data integration. Bioinformatics. 2010; 26(12):i158-67. PMC: 2881394. DOI: 10.1093/bioinformatics/btq210. View

11.

Lee D, Seung H . Learning the parts of objects by non-negative matrix factorization. Nature. 1999; 401(6755):788-91. DOI: 10.1038/44565. View

12.

Meng C, Kuster B, Culhane A, Gholami A . A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics. 2014; 15:162. PMC: 4053266. DOI: 10.1186/1471-2105-15-162. View

13.

Prasad V, Fojo T, Brada M . Precision oncology: origins, optimism, and potential. Lancet Oncol. 2016; 17(2):e81-e86. DOI: 10.1016/S1470-2045(15)00620-8. View

14.

Meng C, Zeleznik O, Thallinger G, Kuster B, Gholami A, Culhane A . Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform. 2016; 17(4):628-41. PMC: 4945831. DOI: 10.1093/bib/bbv108. View

15.

Ahmad A, Frohlich H . Towards clinically more relevant dissection of patient heterogeneity via survival-based Bayesian clustering. Bioinformatics. 2017; 33(22):3558-3566. DOI: 10.1093/bioinformatics/btx464. View

16.

Parkhomenko E, Tritchler D, Beyene J . Sparse canonical correlation analysis with application to genomic data integration. Stat Appl Genet Mol Biol. 2009; 8:Article 1. DOI: 10.2202/1544-6115.1406. View

17.

Li Y, Wu F, Ngom A . A review on machine learning principles for multi-view biological data integration. Brief Bioinform. 2016; 19(2):325-340. DOI: 10.1093/bib/bbw113. View

18.

Ozsolak F, Milos P . RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2010; 12(2):87-98. PMC: 3031867. DOI: 10.1038/nrg2934. View

19.

Hoadley K, Yau C, Wolf D, Cherniack A, Tamborero D, Ng S . Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell. 2014; 158(4):929-944. PMC: 4152462. DOI: 10.1016/j.cell.2014.06.049. View

20.

Zhang S, Liu C, Li W, Shen H, Laird P, Zhou X . Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res. 2012; 40(19):9379-91. PMC: 3479191. DOI: 10.1093/nar/gks725. View