» Articles » PMID: 35864476

Consensus Clustering for Bayesian Mixture Models

Overview
Publisher Biomed Central
Specialty Biology
Date 2022 Jul 21
PMID 35864476
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Cluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. Consensus clustering is an ensemble approach that is widely used in these areas, which combines the output from multiple runs of a non-deterministic clustering algorithm. Here we consider the application of consensus clustering to a broad class of heuristic clustering algorithms that can be derived from Bayesian mixture models (and extensions thereof) by adopting an early stopping criterion when performing sampling-based inference for these models. While the resulting approach is non-Bayesian, it inherits the usual benefits of consensus clustering, particularly in terms of computational scalability and providing assessments of clustering stability/robustness.

Results: In simulation studies, we show that our approach can successfully uncover the target clustering structure, while also exploring different plausible clusterings of the data. We show that, when a parallel computation environment is available, our approach offers significant reductions in runtime compared to performing sampling-based Bayesian inference for the underlying model, while retaining many of the practical benefits of the Bayesian approach, such as exploring different numbers of clusters. We propose a heuristic to decide upon ensemble size and the early stopping criterion, and then apply consensus clustering to a clustering algorithm derived from a Bayesian integrative clustering method. We use the resulting approach to perform an integrative analysis of three 'omics datasets for budding yeast and find clusters of co-expressed genes with shared regulatory proteins. We validate these clusters using data external to the analysis.

Conclustions: Our approach can be used as a wrapper for essentially any existing sampling-based Bayesian clustering implementation, and enables meaningful clustering analyses to be performed using such implementations, even when computational Bayesian inference is not feasible, e.g. due to poor exploration of the target density (often as a result of increasing numbers of features) or a limited computational budget that does not along sufficient samples to drawn from a single chain. This enables researchers to straightforwardly extend the applicability of existing software to much larger datasets, including implementations of sophisticated models such as those that jointly model multiple datasets.

Citing Articles

NDUFA11 may be the disulfidptosis-related biomarker of ischemic stroke based on integrated bioinformatics, clinical samples, and experimental analyses.

Li S, Chen N, He J, Luo X, Lin W Front Neurosci. 2025; 18:1505493.

PMID: 39877656 PMC: 11772302. DOI: 10.3389/fnins.2024.1505493.


Machine Learning-based Framework Develops a Tumor Thrombus Coagulation Signature in Multicenter Cohorts for Renal Cancer.

Feng T, Wang Y, Zhang W, Cai T, Tian X, Su J Int J Biol Sci. 2024; 20(9):3590-3620.

PMID: 38993563 PMC: 11234220. DOI: 10.7150/ijbs.94555.


Identification of disulfidptosis-associated genes and characterization of immune cell infiltration in thyroid carcinoma.

Song S, Zhou J, Zhang L, Sun Y, Zhang Q, Tan Y Aging (Albany NY). 2024; 16(11):9753-9783.

PMID: 38836761 PMC: 11210228. DOI: 10.18632/aging.205897.


Identification of cuproptosis-related gene clusters and immune cell infiltration in major burns based on machine learning models and experimental validation.

Wang X, Xiong Z, Hong W, Liao X, Yang G, Jiang Z Front Immunol. 2024; 15:1335675.

PMID: 38410514 PMC: 10894925. DOI: 10.3389/fimmu.2024.1335675.


Development and implementation of a prognostic model for clear cell renal cell carcinoma based on heterogeneous TLR4 expression.

Zhou Q, Sun Q, Shen Q, Li X, Qian J Heliyon. 2024; 10(4):e25571.

PMID: 38380017 PMC: 10877190. DOI: 10.1016/j.heliyon.2024.e25571.


References
1.
Granovskaia M, Jensen L, Ritchie M, Toedling J, Ning Y, Bork P . High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Genome Biol. 2010; 11(3):R24. PMC: 2864564. DOI: 10.1186/gb-2010-11-3-r24. View

2.
Simon I, Barnett J, Hannett N, Harbison C, Rinaldi N, Volkert T . Serial regulation of transcriptional regulators in the yeast cell cycle. Cell. 2001; 106(6):697-708. DOI: 10.1016/s0092-8674(01)00494-9. View

3.
Hejblum B, Skinner J, Thiebaut R . Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLoS Comput Biol. 2015; 11(6):e1004310. PMC: 4482329. DOI: 10.1371/journal.pcbi.1004310. View

4.
He L, Ray N, Guan Y, Zhang H . Fast Large-Scale Spectral Clustering via Explicit Feature Mapping. IEEE Trans Cybern. 2018; 49(3):1058-1071. DOI: 10.1109/TCYB.2018.2794998. View

5.
Aligianni S, Lackner D, Klier S, Rustici G, Wilhelm B, Marguerat S . The fission yeast homeodomain protein Yox1p binds to MBF and confines MBF-dependent cell-cycle transcription to G1-S via negative feedback. PLoS Genet. 2009; 5(8):e1000626. PMC: 2726434. DOI: 10.1371/journal.pgen.1000626. View