» Articles » PMID: 38609331

Surveying Biomedical Relation Extraction: a Critical Examination of Current Datasets and the Proposal of a New Resource

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Apr 12
PMID 38609331
Authors
Affiliations
Soon will be listed here.
Abstract

Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.

Citing Articles

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

Turki H, Dossou B, Emezue C, Owodunni A, Hadj Taieb M, Ben Aouicha M J Biomed Semantics. 2024; 15(1):18.

PMID: 39354632 PMC: 11445994. DOI: 10.1186/s13326-024-00319-w.

References
1.
Kalyan K, Rajasekharan A, Sangeetha S . AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform. 2022; 126:103982. DOI: 10.1016/j.jbi.2021.103982. View

2.
Lai P, Lu W, Kuo T, Chung C, Han J, Tsai R . Using a Large Margin Context-Aware Convolutional Neural Network to Automatically Extract Disease-Disease Association from Literature: Comparative Analytic Study. JMIR Med Inform. 2019; 7(4):e14502. PMC: 6913619. DOI: 10.2196/14502. View

3.
Alnazzawi N, Thompson P, Batista-Navarro R, Ananiadou S . Using text mining techniques to extract phenotypic information from the PhenoCHF corpus. BMC Med Inform Decis Mak. 2015; 15 Suppl 2:S3. PMC: 4474585. DOI: 10.1186/1472-6947-15-S2-S3. View

4.
Pyysalo S, Airola A, Heimonen J, Bjorne J, Ginter F, Salakoski T . Comparative analysis of five protein-protein interaction corpora. BMC Bioinformatics. 2008; 9 Suppl 3:S6. PMC: 2349296. DOI: 10.1186/1471-2105-9-S3-S6. View

5.
Gottlieb A, Stein G, Ruppin E, Sharan R . PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol Syst Biol. 2011; 7:496. PMC: 3159979. DOI: 10.1038/msb.2011.26. View