Augmenting Subnetwork Inference with Information Extracted from the Scientific Literature

Overview

Journal PLoS Comput Biol

Specialty Biology

Date 2019 Jun 28

PMID 31246951

Authors

Sid Kiblawi

Deborah Chasman

Amanda Henning

Eunju Park

Hoifung Poon

Michael Gould

Paul Ahlquist

Mark Craven

Affiliations

Soon will be listed here.

Abstract

Many biological studies involve either (i) manipulating some aspect of a cell or its environment and then simultaneously measuring the effect on thousands of genes, or (ii) systematically manipulating each gene and then measuring the effect on some response of interest. A common challenge that arises in these studies is to explain how genes identified as relevant in the given experiment are organized into a subnetwork that accounts for the response of interest. The task of inferring a subnetwork is typically dependent on the information available in publicly available, structured databases, which suffer from incompleteness. However, a wealth of potentially relevant information resides in the scientific literature, such as information about genes associated with certain concepts of interest, as well as interactions that occur among various biological entities. We contend that by exploiting this information, we can improve the explanatory power and accuracy of subnetwork inference in multiple applications. Here we propose and investigate several ways in which information extracted from the scientific literature can be used to augment subnetwork inference. We show that we can use literature-extracted information to (i) augment the set of entities identified as being relevant in a subnetwork inference task, (ii) augment the set of interactions used in the process, and (iii) support targeted browsing of a large inferred subnetwork by identifying entities and interactions that are closely related to concepts of interest. We use this approach to uncover the pathways involved in interactions between a virus and a host cell, and the pathways that are regulated by a transcription factor associated with breast cancer. Our experimental results demonstrate that these approaches can provide more accurate and more interpretable subnetworks. Integer program code, background network data, and pathfinding code are available at https://github.com/Craven-Biostat-Lab/subnetwork_inference.

References

Akutsu T, Miyano S, Kuhara S . Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac Symp Biocomput. 1999; :17-28. DOI: 10.1142/9789814447300_0003. View

Markowetz F, Bloch J, Spang R . Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics. 2005; 21(21):4026-32. DOI: 10.1093/bioinformatics/bti662. View

Brass A, Dykxhoorn D, Benita Y, Yan N, Engelman A, Xavier R . Identification of host proteins required for HIV infection through a functional genomic screen. Science. 2008; 319(5865):921-6. DOI: 10.1126/science.1152725. View

Marcotte R, Sayad A, Brown K, Sanchez-Garcia F, Reimand J, Haider M . Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance. Cell. 2016; 164(1-2):293-309. PMC: 4724865. DOI: 10.1016/j.cell.2015.11.062. View

. The Gene Ontology in 2010: extensions and refinements. Nucleic Acids Res. 2009; 38(Database issue):D331-5. PMC: 2808930. DOI: 10.1093/nar/gkp1018. View

Newman R, Hu J, Rho H, Xie Z, Woodard C, Neiswinger J . Construction of human activity-based phosphorylation networks. Mol Syst Biol. 2013; 9:655. PMC: 3658267. DOI: 10.1038/msb.2013.12. View

Konig R, Zhou Y, Elleder D, Diamond T, Bonamy G, Irelan J . Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell. 2008; 135(1):49-60. PMC: 2628946. DOI: 10.1016/j.cell.2008.07.032. View

Kushner D, Lindenbach B, Grdzelishvili V, Noueiry A, Paul S, Ahlquist P . Systematic, genome-wide identification of host genes affecting replication of a positive-strand RNA virus. Proc Natl Acad Sci U S A. 2003; 100(26):15764-9. PMC: 307642. DOI: 10.1073/pnas.2536857100. View

Smits B, Haag J, Rissman A, Sharma D, Tran A, Schoenborn A . The gene desert mammary carcinoma susceptibility locus Mcs1a regulates Nr2f1 modifying mammary epithelial cell differentiation and proliferation. PLoS Genet. 2013; 9(6):e1003549. PMC: 3681674. DOI: 10.1371/journal.pgen.1003549. View

10.

De Maeyer D, Renkens J, Cloots L, De Raedt L, Marchal K . PheNetic: network-based interpretation of unstructured gene lists in E. coli. Mol Biosyst. 2013; 9(7):1594-603. DOI: 10.1039/c3mb25551d. View

11.

Croft D, Mundo A, Haw R, Milacic M, Weiser J, Wu G . The Reactome pathway knowledgebase. Nucleic Acids Res. 2013; 42(Database issue):D472-7. PMC: 3965010. DOI: 10.1093/nar/gkt1102. View

12.

Liang S, Fuhrman S, Somogyi R . Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac Symp Biocomput. 1998; :18-29. View

13.

Schaefer M, Fontaine J, Vinayagam A, Porras P, Wanker E, Andrade-Navarro M . HIPPIE: Integrating protein interaction networks with experiment based quality scores. PLoS One. 2012; 7(2):e31826. PMC: 3279424. DOI: 10.1371/journal.pone.0031826. View

14.

Poon H, Quirk C, DeZiel C, Heckerman D . Literome: PubMed-scale genomic knowledge base in the cloud. Bioinformatics. 2014; 30(19):2840-2. DOI: 10.1093/bioinformatics/btu383. View

15.

Stark C, Breitkreutz B, Reguly T, Boucher L, Breitkreutz A, Tyers M . BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2005; 34(Database issue):D535-9. PMC: 1347471. DOI: 10.1093/nar/gkj109. View

16.

Murali T, Dyer M, Badger D, Tyler B, Katze M . Network-based prediction and analysis of HIV dependency factors. PLoS Comput Biol. 2011; 7(9):e1002164. PMC: 3178628. DOI: 10.1371/journal.pcbi.1002164. View

17.

Zhou H, Xu M, Huang Q, Gates A, Zhang X, Castle J . Genome-scale RNAi screen for host factors required for HIV replication. Cell Host Microbe. 2008; 4(5):495-504. DOI: 10.1016/j.chom.2008.10.004. View

18.

Tsuda K, Shin H, Scholkopf B . Fast protein classification with multiple networks. Bioinformatics. 2005; 21 Suppl 2:ii59-65. DOI: 10.1093/bioinformatics/bti1110. View

19.

Chasman D, Gancarz B, Hao L, Ferris M, Ahlquist P, Craven M . Inferring host gene subnetworks involved in viral replication. PLoS Comput Biol. 2014; 10(5):e1003626. PMC: 4038467. DOI: 10.1371/journal.pcbi.1003626. View

20.

Ourfali O, Shlomi T, Ideker T, Ruppin E, Sharan R . SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments. Bioinformatics. 2007; 23(13):i359-66. DOI: 10.1093/bioinformatics/btm170. View