Identifying Multi-layer Gene Regulatory Modules from Multi-dimensional Genomic Data
Overview
Affiliations
Motivation: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse.
Results: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local 'gene expression factory'. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes.
Availability And Implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/.
Contact: xjzhou@usc.edu
Supplementary Information: Supplementary material are available at Bioinformatics online.
asmbPLS: biomarker identification and patient survival prediction with multi-omics data.
Zhang R, Datta S Front Genet. 2024; 15:1444054.
PMID: 39649094 PMC: 11621212. DOI: 10.3389/fgene.2024.1444054.
Methods for multi-omic data integration in cancer research.
Hernandez-Lemus E, Ochoa S Front Genet. 2024; 15:1425456.
PMID: 39364009 PMC: 11446849. DOI: 10.3389/fgene.2024.1425456.
Tiwari P, Tripathi L Cancers (Basel). 2024; 16(16).
PMID: 39199690 PMC: 11352509. DOI: 10.3390/cancers16162920.
A guided network estimation approach using multi-omic information.
Bartzis G, Peeters C, Ligterink W, van Eeuwijk F BMC Bioinformatics. 2024; 25(1):202.
PMID: 38816801 PMC: 11137963. DOI: 10.1186/s12859-024-05778-7.
Vieira F, Bispo R, Lopes M Bioinform Biol Insights. 2024; 18:11779322241249563.
PMID: 38812741 PMC: 11135104. DOI: 10.1177/11779322241249563.