» Articles » PMID: 17683642

Automatic Reconstruction of a Bacterial Regulatory Network Using Natural Language Processing

Overview
Publisher Biomed Central
Specialty Biology
Date 2007 Aug 9
PMID 17683642
Citations 23
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in Escherichia coli K-12.

Results: Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners.

Conclusion: Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.

Citing Articles

YTLR: Extracting yeast transcription factor-gene associations from the literature using automated literature readers.

Yang T, Wang C, Tsai H, Yang Y, Liu C Comput Struct Biotechnol J. 2022; 20:4636-4644.

PMID: 36090812 PMC: 9449546. DOI: 10.1016/j.csbj.2022.08.041.


Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts.

Roy S, Yun D, Madahian B, Berry M, Deng L, Goldowitz D Front Bioeng Biotechnol. 2017; 5:48.

PMID: 28894735 PMC: 5581332. DOI: 10.3389/fbioe.2017.00048.


An integrated text mining framework for metabolic interaction network reconstruction.

Patumcharoenpol P, Doungpan N, Meechai A, Shen B, Chan J, Vongsangnak W PeerJ. 2016; 4:e1811.

PMID: 27019783 PMC: 4806637. DOI: 10.7717/peerj.1811.


Overview of the gene regulation network and the bacteria biotope tasks in BioNLP'13 shared task.

Bossy R, Golik W, Ratkovic Z, Valsamou D, Bessieres P, Nedellec C BMC Bioinformatics. 2015; 16 Suppl 10:S1.

PMID: 26202448 PMC: 4511173. DOI: 10.1186/1471-2105-16-S10-S1.


Text mining and network analysis of molecular interaction in non-small cell lung cancer by using natural language processing.

Li J, Bi L, Sun Y, Lu Z, Lin Y, Bai O Mol Biol Rep. 2014; 41(12):8071-9.

PMID: 25205120 DOI: 10.1007/s11033-014-3705-5.


References
1.
Saric J, Jensen L, Rojas I . Large-scale extraction of gene regulation for model organisms in an ontological context. In Silico Biol. 2005; 5(1):21-32. View

2.
Rodriguez-Esteban R, Iossifov I, Rzhetsky A . Imitating manual curation of text-mined facts in biomedicine. PLoS Comput Biol. 2006; 2(9):e118. PMC: 1560402. DOI: 10.1371/journal.pcbi.0020118. View

3.
Corney D, Buxton B, Langdon W, Jones D . BioRAT: extracting biological information from full-length papers. Bioinformatics. 2004; 20(17):3206-13. DOI: 10.1093/bioinformatics/bth386. View

4.
Friedman C, Kra P, Yu H, Krauthammer M, Rzhetsky A . GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics. 2001; 17 Suppl 1:S74-82. DOI: 10.1093/bioinformatics/17.suppl_1.s74. View

5.
Krallinger M, Erhardt R, Valencia A . Text-mining approaches in molecular biology and biomedicine. Drug Discov Today. 2005; 10(6):439-45. DOI: 10.1016/S1359-6446(05)03376-3. View