» Articles » PMID: 22759460

Semantically Linking Molecular Entities in Literature Through Entity Relationships

Overview
Publisher Biomed Central
Specialty Biology
Date 2012 Jul 5
PMID 22759460
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts.

Results: We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score > 90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts.

Conclusions: The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale.

Citing Articles

Sieve-based relation extraction of gene regulatory networks from biological literature.

Zitnik S, Zitnik M, Zupan B, Bajec M BMC Bioinformatics. 2015; 16 Suppl 16:S1.

PMID: 26551454 PMC: 4642041. DOI: 10.1186/1471-2105-16-S16-S1.


RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.

Torii M, Arighi C, Li G, Wang Q, Wu C, Vijay-Shanker K IEEE/ACM Trans Comput Biol Bioinform. 2015; 12(1):17-29.

PMID: 26357075 PMC: 4568560. DOI: 10.1109/TCBB.2014.2372765.


Extracting biomedical events from pairs of text entities.

Liu X, Bordes A, Grandvalet Y BMC Bioinformatics. 2015; 16 Suppl 10:S8.

PMID: 26201478 PMC: 4511465. DOI: 10.1186/1471-2105-16-S10-S8.


A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.

Peng Y, Torii M, Wu C, Vijay-Shanker K BMC Bioinformatics. 2014; 15:285.

PMID: 25149151 PMC: 4262219. DOI: 10.1186/1471-2105-15-285.


Event-based text mining for biology and functional genomics.

Ananiadou S, Thompson P, Nawaz R, McNaught J, Kell D Brief Funct Genomics. 2014; 14(3):213-30.

PMID: 24907365 PMC: 4499874. DOI: 10.1093/bfgp/elu015.


References
1.
Krallinger M, Leitner F, Valencia A . Analysis of biological processes and diseases using text mining approaches. Methods Mol Biol. 2009; 593:341-82. DOI: 10.1007/978-1-60327-194-3_16. View

2.
Van Landeghem S, Abeel T, Saeys Y, Van de Peer Y . Discriminative and informative features for biomolecular text mining with ensemble feature selection. Bioinformatics. 2010; 26(18):i554-60. PMC: 2935429. DOI: 10.1093/bioinformatics/btq381. View

3.
Leaman R, Gonzalez G . BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput. 2008; :652-63. View

4.
Kim J, Ohta T, Tsujii J . Corpus annotation for mining biomedical events from literature. BMC Bioinformatics. 2008; 9:10. PMC: 2267702. DOI: 10.1186/1471-2105-9-10. View

5.
Sayers E, Barrett T, Benson D, Bolton E, Bryant S, Canese K . Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009; 38(Database issue):D5-16. PMC: 2808881. DOI: 10.1093/nar/gkp967. View