Extracting Gene Expression Profiles Common to Colon and Pancreatic Adenocarcinoma Using Simultaneous Nonnegative Matrix Factorization
Overview
Authors
Affiliations
In this paper we introduce a clustering algorithm capable of simultaneously factorizing two distinct gene expression datasets with the aim of uncovering gene regulatory programs that are common to the two phenotypes. The siNMF algorithm simultaneously searches for two factorizations that share the same gene expression profiles. The two key ingredients of this algorithm are the nonnegativity constraint and the offset variables, which together ensure the sparseness of the factorizations. While cancer is a very heterogeneous disease, there is overwhelming recent evidence that the differences between cancer subtypes implicate entire pathways and biological processes involving large numbers of genes, rather than changes in single genes. We have applied our simultaneous factorization algorithm looking for gene expression profiles that are common between the more homogeneous pancreatic ductal adenocarcinoma (PDAC) and the more heterogeneous colon adenocarcinoma. The fact that the PDAC signature is active in a large fraction of colon adeocarcinoma suggests that the oncogenic mechanisms involved may be similar to those in PDAC, at least in this subset of colon samples. There are many approaches to uncovering common mechanisms involved in different phenotypes, but most are based on comparing gene lists. The approach presented in this paper additionally takes gene expression data into account and can thus be more sensitive.
WormTensor: a clustering method for time-series whole-brain activity data from C. elegans.
Tsuyuzaki K, Yamamoto K, Toyoshima Y, Sato H, Kanamori M, Teramoto T BMC Bioinformatics. 2023; 24(1):254.
PMID: 37328814 PMC: 10273573. DOI: 10.1186/s12859-023-05230-2.
Yu Z, Bian C, Liu G, Zhang S, Wong K, Li X Brief Bioinform. 2021; 22(5).
PMID: 33855366 PMC: 8579163. DOI: 10.1093/bib/bbab125.
A hierarchical spatiotemporal analog forecasting model for count data.
McDermott P, Wikle C, Millspaugh J Ecol Evol. 2018; 8(1):790-800.
PMID: 29321914 PMC: 5756884. DOI: 10.1002/ece3.3621.
Anderson W, Greenhalgh A, Takwale A, David S, Vadigepalli R Front Cell Neurosci. 2017; 11:233.
PMID: 28855862 PMC: 5557777. DOI: 10.3389/fncel.2017.00233.
Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis.
Chu S, Huang Y BMC Bioinformatics. 2017; 18(1):336.
PMID: 28697753 PMC: 5505153. DOI: 10.1186/s12859-017-1737-2.