» Articles » PMID: 18834491

OntoGene in BioCreative II

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2008 Oct 18
PMID 18834491
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment.

Results: In this report we describe approaches taken within the scope of the second BioCreative competition in order to solve two aspects of this problem: detection of novel protein interactions reported in scientific articles, and detection of the experimental method that was used to confirm the interaction. Our approach to the former problem is based on a high-recall protein annotation step, followed by two strict disambiguation steps. The remaining proteins are then combined according to a number of lexico-syntactic filters, which deliver high-precision results while maintaining reasonable recall. The detection of the experimental methods is tackled by a pattern matching approach, which has delivered the best results in the official BioCreative evaluation.

Conclusion: Although the results of BioCreative clearly show that no tool is sufficiently reliable for fully automated annotations, a few of the proposed approaches (including our own) already perform at a competitive level. This makes them interesting either as standalone tools for preliminary document inspection, or as modules within an environment aimed at supporting the process of curation of biomedical literature.

Citing Articles

OGER++: hybrid multi-type entity recognition.

Furrer L, Jancso A, Colic N, Rinaldi F J Cheminform. 2019; 11(1):7.

PMID: 30666476 PMC: 6689863. DOI: 10.1186/s13321-018-0326-3.


Entity recognition in the biomedical domain using a hybrid approach.

Basaldella M, Furrer L, Tasso C, Rinaldi F J Biomed Semantics. 2017; 8(1):51.

PMID: 29122011 PMC: 5679148. DOI: 10.1186/s13326-017-0157-6.


Strategies towards digital and semi-automated curation in RegulonDB.

Rinaldi F, Lithgow O, Gama-Castro S, Solano H, Lopez A, Muniz Rascado L Database (Oxford). 2017; 2017(1).

PMID: 28365731 PMC: 5467564. DOI: 10.1093/database/bax012.


Automatic query generation using word embeddings for retrieving passages describing experimental methods.

Aydin F, Husunbeyi Z, Ozgur A Database (Oxford). 2017; 2017.

PMID: 28077568 PMC: 5225401. DOI: 10.1093/database/baw166.


Automated detection of discourse segment and experimental types from the text of cancer pathway results sections.

Burns G, Dasigi P, de Waard A, Hovy E Database (Oxford). 2016; 2016.

PMID: 27580922 PMC: 5006090. DOI: 10.1093/database/baw122.


References
1.
Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A . The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data. Nat Biotechnol. 2004; 22(2):177-83. DOI: 10.1038/nbt926. View

2.
Temkin J, Gilder M . Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics. 2003; 19(16):2046-53. DOI: 10.1093/bioinformatics/btg279. View

3.
Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I . Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004; 20(5):604-11. DOI: 10.1093/bioinformatics/btg452. View

4.
SWANSON D . Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986; 30(1):7-18. DOI: 10.1353/pbm.1986.0087. View

5.
Krauthammer M, Nenadic G . Term identification in the biomedical literature. J Biomed Inform. 2004; 37(6):512-26. DOI: 10.1016/j.jbi.2004.08.004. View