» Articles » PMID: 31246951

Augmenting Subnetwork Inference with Information Extracted from the Scientific Literature

Overview
Specialty Biology
Date 2019 Jun 28
PMID 31246951
Authors
Affiliations
Soon will be listed here.
Abstract

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.

References
1.
Akutsu T, Miyano S, Kuhara S . Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput. 1999; :17-28. DOI: 10.1142/9789814447300_0003. View

2.
Markowetz F, Bloch J, Spang R . Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics. 2005; 21(21):4026-32. DOI: 10.1093/bioinformatics/bti662. View

3.
Brass A, Dykxhoorn D, Benita Y, Yan N, Engelman A, Xavier R . Identification of host proteins required for HIV infection through a functional genomic screen. Science. 2008; 319(5865):921-6. DOI: 10.1126/science.1152725. View

4.
Marcotte R, Sayad A, Brown K, Sanchez-Garcia F, Reimand J, Haider M . Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance. Cell. 2016; 164(1-2):293-309. PMC: 4724865. DOI: 10.1016/j.cell.2015.11.062. View

5.
. The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2009; 38(Database issue):D331-5. PMC: 2808930. DOI: 10.1093/nar/gkp1018. View