Consensus Clustering for Bayesian Mixture Models

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2022 Jul 21

PMID 35864476

Authors

Stephen Coleman

Paul D W Kirk

Chris Wallace

Affiliations

Soon will be listed here.

Abstract

Background: Cluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. Consensus clustering is an ensemble approach that is widely used in these areas, which combines the output from multiple runs of a non-deterministic clustering algorithm. Here we consider the application of consensus clustering to a broad class of heuristic clustering algorithms that can be derived from Bayesian mixture models (and extensions thereof) by adopting an early stopping criterion when performing sampling-based inference for these models. While the resulting approach is non-Bayesian, it inherits the usual benefits of consensus clustering, particularly in terms of computational scalability and providing assessments of clustering stability/robustness.

Results: In simulation studies, we show that our approach can successfully uncover the target clustering structure, while also exploring different plausible clusterings of the data. We show that, when a parallel computation environment is available, our approach offers significant reductions in runtime compared to performing sampling-based Bayesian inference for the underlying model, while retaining many of the practical benefits of the Bayesian approach, such as exploring different numbers of clusters. We propose a heuristic to decide upon ensemble size and the early stopping criterion, and then apply consensus clustering to a clustering algorithm derived from a Bayesian integrative clustering method. We use the resulting approach to perform an integrative analysis of three 'omics datasets for budding yeast and find clusters of co-expressed genes with shared regulatory proteins. We validate these clusters using data external to the analysis.

Conclustions: Our approach can be used as a wrapper for essentially any existing sampling-based Bayesian clustering implementation, and enables meaningful clustering analyses to be performed using such implementations, even when computational Bayesian inference is not feasible, e.g. due to poor exploration of the target density (often as a result of increasing numbers of features) or a limited computational budget that does not along sufficient samples to drawn from a single chain. This enables researchers to straightforwardly extend the applicability of existing software to much larger datasets, including implementations of sophisticated models such as those that jointly model multiple datasets.

Citing Articles

NDUFA11 may be the disulfidptosis-related biomarker of ischemic stroke based on integrated bioinformatics, clinical samples, and experimental analyses.

Li S, Chen N, He J, Luo X, Lin W Front Neurosci. 2025; 18:1505493.

PMID: 39877656 PMC: 11772302. DOI: 10.3389/fnins.2024.1505493.

Machine Learning-based Framework Develops a Tumor Thrombus Coagulation Signature in Multicenter Cohorts for Renal Cancer.

Feng T, Wang Y, Zhang W, Cai T, Tian X, Su J Int J Biol Sci. 2024; 20(9):3590-3620.

PMID: 38993563 PMC: 11234220. DOI: 10.7150/ijbs.94555.

Identification of disulfidptosis-associated genes and characterization of immune cell infiltration in thyroid carcinoma.

Song S, Zhou J, Zhang L, Sun Y, Zhang Q, Tan Y Aging (Albany NY). 2024; 16(11):9753-9783.

PMID: 38836761 PMC: 11210228. DOI: 10.18632/aging.205897.

Identification of cuproptosis-related gene clusters and immune cell infiltration in major burns based on machine learning models and experimental validation.

Wang X, Xiong Z, Hong W, Liao X, Yang G, Jiang Z Front Immunol. 2024; 15:1335675.

PMID: 38410514 PMC: 10894925. DOI: 10.3389/fimmu.2024.1335675.

Development and implementation of a prognostic model for clear cell renal cell carcinoma based on heterogeneous TLR4 expression.

Zhou Q, Sun Q, Shen Q, Li X, Qian J Heliyon. 2024; 10(4):e25571.

PMID: 38380017 PMC: 10877190. DOI: 10.1016/j.heliyon.2024.e25571.

References

Granovskaia M, Jensen L, Ritchie M, Toedling J, Ning Y, Bork P . High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Genome Biol. 2010; 11(3):R24. PMC: 2864564. DOI: 10.1186/gb-2010-11-3-r24. View

Simon I, Barnett J, Hannett N, Harbison C, Rinaldi N, Volkert T . Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001; 106(6):697-708. DOI: 10.1016/s0092-8674(01)00494-9. View

Hejblum B, Skinner J, Thiebaut R . Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Comput Biol. 2015; 11(6):e1004310. PMC: 4482329. DOI: 10.1371/journal.pcbi.1004310. View

He L, Ray N, Guan Y, Zhang H . Fast Large-Scale Spectral Clustering via Explicit Feature Mapping. IEEE Trans Cybern. 2018; 49(3):1058-1071. DOI: 10.1109/TCYB.2018.2794998. View

Aligianni S, Lackner D, Klier S, Rustici G, Wilhelm B, Marguerat S . The fission yeast homeodomain protein Yox1p binds to MBF and confines MBF-dependent cell-cycle transcription to G1-S via negative feedback. PLoS Genet. 2009; 5(8):e1000626. PMC: 2726434. DOI: 10.1371/journal.pgen.1000626. View

John C, Watson D, Russ D, Goldmann K, Ehrenstein M, Pitzalis C . M3C: Monte Carlo reference-based consensus clustering. Sci Rep. 2020; 10(1):1816. PMC: 7000518. DOI: 10.1038/s41598-020-58766-1. View

Jimenez J, Bru S, Ribeiro M, Clotet J . Live fast, die soon: cell cycle progression and lifespan in yeast cells. Microb Cell. 2017; 2(3):62-67. PMC: 5349179. DOI: 10.15698/mic2015.03.191. View

Gabasova E, Reid J, Wernisch L . Clusternomics: Integrative context-dependent clustering for heterogeneous datasets. PLoS Comput Biol. 2017; 13(10):e1005781. PMC: 5658176. DOI: 10.1371/journal.pcbi.1005781. View

Ni Y, Muller P, Diesendruck M, Williamson S, Zhu Y, Ji Y . Scalable Bayesian Nonparametric Clustering and Classification. J Comput Graph Stat. 2020; 29(1):53-65. PMC: 7518195. DOI: 10.1080/10618600.2019.1624366. View

10.

Verhaak R, Hoadley K, Purdom E, Wang V, Qi Y, Wilkerson M . Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 2010; 17(1):98-110. PMC: 2818769. DOI: 10.1016/j.ccr.2009.12.020. View

11.

Wilkerson M, Hayes D . ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking. Bioinformatics. 2010; 26(12):1572-3. PMC: 2881355. DOI: 10.1093/bioinformatics/btq170. View

12.

de Bruin R, Kalashnikova T, Chahwan C, McDonald W, Wohlschlegel J, Yates 3rd J . Constraining G1-specific transcription to late G1 phase: the MBF-associated corepressor Nrm1 acts via negative feedback. Mol Cell. 2006; 23(4):483-96. DOI: 10.1016/j.molcel.2006.06.025. View

13.

Iyer V, Horak C, Scafe C, Botstein D, Snyder M, Brown P . Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature. 2001; 409(6819):533-8. DOI: 10.1038/35054095. View

14.

Bai J, Alekseyenko A, Statnikov A, Wang I, Wong P . Strategic applications of gene expression: from drug discovery/development to bedside. AAPS J. 2013; 15(2):427-37. PMC: 3675744. DOI: 10.1208/s12248-012-9447-1. View

15.

Ni Y, Ji Y, Muller P . Consensus Monte Carlo for Random Subsets using Shared Anchors. J Comput Graph Stat. 2021; 29(4):703-714. PMC: 7810350. DOI: 10.1080/10618600.2020.1737085. View

16.

Kiselev V, Kirschner K, Schaub M, Andrews T, Yiu A, Chandra T . SC3: consensus clustering of single-cell RNA-seq data. Nat Methods. 2017; 14(5):483-486. PMC: 5410170. DOI: 10.1038/nmeth.4236. View

17.

Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M . BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2005; 34(Database issue):D535-9. PMC: 1347471. DOI: 10.1093/nar/gkj109. View

18.

Scrucca L, Fop M, Murphy T, Raftery A . mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. R J. 2016; 8(1):289-317. PMC: 5096736. View

19.

Cai D, Chen X . Large Scale Spectral Clustering Via Landmark-Based Sparse Representation. IEEE Trans Cybern. 2014; 45(8):1669-80. DOI: 10.1109/TCYB.2014.2358564. View

20.

Toth A, Ciosk R, Uhlmann F, Galova M, Schleiffer A, Nasmyth K . Yeast cohesin complex requires a conserved protein, Eco1p(Ctf7), to establish cohesion between sister chromatids during DNA replication. Genes Dev. 1999; 13(3):320-33. PMC: 316435. DOI: 10.1101/gad.13.3.320. View