» Articles » PMID: 12689350

PreBIND and Textomy--mining the Biomedical Literature for Protein-protein Interactions Using a Support Vector Machine

Overview
Publisher Biomed Central
Specialty Biology
Date 2003 Apr 12
PMID 12689350
Citations 88
Authors
Affiliations
Soon will be listed here.
Abstract

Background: The majority of experimentally verified molecular interaction and biological pathway data are present in the unstructured text of biomedical journal articles where they are inaccessible to computational methods. The Biomolecular interaction network database (BIND) seeks to capture these data in a machine-readable format. We hypothesized that the formidable task-size of backfilling the database could be reduced by using Support Vector Machine technology to first locate interaction information in the literature. We present an information extraction system that was designed to locate protein-protein interaction data in the literature and present these data to curators and the public for review and entry into BIND.

Results: Cross-validation estimated the support vector machine's test-set precision, accuracy and recall for classifying abstracts describing interaction information was 92%, 90% and 92% respectively. We estimated that the system would be able to recall up to 60% of all non-high throughput interactions present in another yeast-protein interaction database. Finally, this system was applied to a real-world curation problem and its use was found to reduce the task duration by 70% thus saving 176 days.

Conclusions: Machine learning methods are useful as tools to direct interaction and pathway database back-filling; however, this potential can only be realized if these techniques are coupled with human review and entry into a factual database such as BIND. The PreBIND system described here is available to the public at http://bind.ca. Current capabilities allow searching for human, mouse and yeast protein-interaction information.

Citing Articles

IL-1β and associated molecules as prognostic biomarkers linked with immune cell infiltration in colorectal cancer: an integrated statistical and machine learning approach.

Sahoo K, Sundararajan V Discov Oncol. 2025; 16(1):252.

PMID: 40019680 PMC: 11871282. DOI: 10.1007/s12672-025-01989-3.


Biomedical Text Classification Using Augmented Word Representation Based on Distributional and Relational Contexts.

Parwez M, Fazil M, Arif M, Nafis M, Auwul M Comput Intell Neurosci. 2024; 2023:2989791.

PMID: 39262497 PMC: 11390191. DOI: 10.1155/2023/2989791.


Recent advances in biomedical literature mining.

Zhao S, Su C, Lu Z, Wang F Brief Bioinform. 2020; 22(3).

PMID: 32422651 PMC: 8138828. DOI: 10.1093/bib/bbaa057.


Multitask learning for biomedical named entity recognition with cross-sharing structure.

Wang X, Lyu J, Dong L, Xu K BMC Bioinformatics. 2019; 20(1):427.

PMID: 31419937 PMC: 6697996. DOI: 10.1186/s12859-019-3000-5.


Triage by ranking to support the curation of protein interactions.

Mottin L, Pasche E, Gobeill J, de Laval V, Gleizes A, Michel P Database (Oxford). 2017; 2017.

PMID: 29220432 PMC: 5502361. DOI: 10.1093/database/bax040.


References
1.
Blaschke C, Valencia A . The potential use of SUISEKI as a protein interaction discovery tool. Genome Inform. 2002; 12:123-34. View

2.
Proux D, Rechenmann F, Julliard L . A pragmatic information extraction strategy for gathering data on genetic interactions. Proc Int Conf Intell Syst Mol Biol. 2000; 8:279-85. View

3.
Wheeler D, Church D, Lash A, Leipe D, Madden T, Pontius J . Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res. 2001; 30(1):13-6. PMC: 99094. DOI: 10.1093/nar/30.1.13. View

4.
Humphreys K, Demetriou G, Gaizauskas R . Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. Pac Symp Biocomput. 2000; :505-16. DOI: 10.1142/9789814447331_0048. View

5.
Pruitt K, Maglott D . RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2000; 29(1):137-40. PMC: 29787. DOI: 10.1093/nar/29.1.137. View