Biclustering Sparse Binary Genomic Data
Overview
Molecular Biology
Affiliations
Genomic datasets often consist of large, binary, sparse data matrices. In such a dataset, one is often interested in finding contiguous blocks that (mostly) contain ones. This is a biclustering problem, and while many algorithms have been proposed to deal with gene expression data, only two algorithms have been proposed that specifically deal with binary matrices. None of the gene expression biclustering algorithms can handle the large number of zeros in sparse binary matrices. The two proposed binary algorithms failed to produce meaningful results. In this article, we present a new algorithm that is able to extract biclusters from sparse, binary datasets. A powerful feature is that biclusters with different numbers of rows and columns can be detected, varying from many rows to few columns and few rows to many columns. It allows the user to guide the search towards biclusters of specific dimensions. When applying our algorithm to an input matrix derived from TRANSFAC, we find transcription factors with distinctly dissimilar binding motifs, but a clear set of common targets that are significantly enriched for GO categories.
Biran H, Hashimshony T, Lahav T, Efrat O, Mandel-Gutfreund Y, Yakhini Z Sci Rep. 2024; 14(1):26121.
PMID: 39478009 PMC: 11525848. DOI: 10.1038/s41598-024-75314-3.
Bayesian Double Feature Allocation for Phenotyping with Electronic Health Records.
Ni Y, Muller P, Ji Y J Am Stat Assoc. 2023; 115(532):1620-1634.
PMID: 38111606 PMC: 10727496. DOI: 10.1080/01621459.2019.1686985.
RUBic: rapid unsupervised biclustering.
Sriwastava B, Halder A, Basu S, Chakraborti T BMC Bioinformatics. 2023; 24(1):435.
PMID: 37974081 PMC: 10655409. DOI: 10.1186/s12859-023-05534-3.
Semantic biclustering for finding local, interpretable and predictive expression patterns.
Klema J, Malinka F, Zelezny F BMC Genomics. 2018; 18(Suppl 7):752.
PMID: 29513193 PMC: 5657082. DOI: 10.1186/s12864-017-4132-5.
Large-scale bioactivity analysis of the small-molecule assayed proteome.
Backman T, Evans D, Girke T PLoS One. 2017; 12(2):e0171413.
PMID: 28178331 PMC: 5298297. DOI: 10.1371/journal.pone.0171413.