Robust Biclustering by Sparse Singular Value Decomposition Incorporating Stability Selection
Overview
Affiliations
Motivation: Over the past decade, several biclustering approaches have been published in the field of gene expression data analysis. Despite of huge diversity regarding the mathematical concepts of the different biclustering methods, many of them can be related to the singular value decomposition (SVD). Recently, a sparse SVD approach (SSVD) has been proposed to reveal biclusters in gene expression data. In this article, we propose to incorporate stability selection to improve this method. Stability selection is a subsampling-based variable selection that allows to control Type I error rates. The here proposed S4VD algorithm incorporates this subsampling approach to find stable biclusters, and to estimate the selection probabilities of genes and samples to belong to the biclusters.
Results: So far, the S4VD method is the first biclustering approach that takes the cluster stability regarding perturbations of the data into account. Application of the S4VD algorithm to a lung cancer microarray dataset revealed biclusters that correspond to coregulated genes associated with cancer subtypes. Marker genes for different lung cancer subtypes showed high selection probabilities to belong to the corresponding biclusters. Moreover, the genes associated with the biclusters belong to significantly enriched cancer-related Gene Ontology categories. In a simulation study, the S4VD algorithm outperformed the SSVD algorithm and two other SVD-related biclustering methods in recovering artificial biclusters and in being robust to noisy data.
Availability: R-Code of the S4VD algorithm as well as a documentation can be found at http://s4vd.r-forge.r-project.org/.
Robust convex biclustering with a tuning-free method.
Chen Y, Lei C, Li C, Ma H, Hu N J Appl Stat. 2025; 52(2):271-286.
PMID: 39926177 PMC: 11800347. DOI: 10.1080/02664763.2024.2367143.
Generalized Matrix Local Low Rank Representation by Random Projection and Submatrix Propagation.
Dang P, Zhu H, Guo T, Wan C, Zhao T, Salama P KDD. 2024; 2023:390-401.
PMID: 38948121 PMC: 11211019. DOI: 10.1145/3580305.3599361.
CAbiNet: joint clustering and visualization of cells and genes for single-cell transcriptomics.
Zhao Y, Kohl C, Rosebrock D, Hu Q, Hu Y, Vingron M Nucleic Acids Res. 2024; 52(13):e57.
PMID: 38850160 PMC: 11260446. DOI: 10.1093/nar/gkae480.
Chekouo T, Mukherjee H Biom J. 2024; 66(4):e2300173.
PMID: 38817110 PMC: 11239327. DOI: 10.1002/bimj.202300173.
RUBic: rapid unsupervised biclustering.
Sriwastava B, Halder A, Basu S, Chakraborti T BMC Bioinformatics. 2023; 24(1):435.
PMID: 37974081 PMC: 10655409. DOI: 10.1186/s12859-023-05534-3.