» Articles » PMID: 36068535

BertSRC: Transformer-based Semantic Relation Classification

Overview
Publisher Biomed Central
Date 2022 Sep 6
PMID 36068535
Authors
Affiliations
Soon will be listed here.
Abstract

The relationship between biomedical entities is complex, and many of them have not yet been identified. For many biomedical research areas including drug discovery, it is of paramount importance to identify the relationships that have already been established through a comprehensive literature survey. However, manually searching through literature is difficult as the amount of biomedical publications continues to increase. Therefore, the relation classification task, which automatically mines meaningful relations from the literature, is spotlighted in the field of biomedical text mining. By applying relation classification techniques to the accumulated biomedical literature, existing semantic relations between biomedical entities that can help to infer previously unknown relationships are efficiently grasped. To develop semantic relation classification models, which is a type of supervised machine learning, it is essential to construct a training dataset that is manually annotated by biomedical experts with semantic relations among biomedical entities. Any advanced model must be trained on a dataset with reliable quality and meaningful scale to be deployed in the real world and can assist biologists in their research. In addition, as the number of such public datasets increases, the performance of machine learning algorithms can be accurately revealed and compared by using those datasets as a benchmark for model development and improvement. In this paper, we aim to build such a dataset. Along with that, to validate the usability of the dataset as training data for relation classification models and to improve the performance of the relation extraction task, we built a relation classification model based on Bidirectional Encoder Representations from Transformers (BERT) trained on our dataset, applying our newly proposed fine-tuning methodology. In experiments comparing performance among several models based on different deep learning algorithms, our model with the proposed fine-tuning methodology showed the best performance. The experimental results show that the constructed training dataset is an important information resource for the development and evaluation of semantic relation extraction models. Furthermore, relation extraction performance can be improved by integrating our proposed fine-tuning methodology. Therefore, this can lead to the promotion of future text mining research in the biomedical field.

Citing Articles

MeSH2Matrix: combining MeSH keywords and machine learning for biomedical relation classification based on PubMed.

Turki H, Dossou B, Emezue C, Owodunni A, Hadj Taieb M, Ben Aouicha M J Biomed Semantics. 2024; 15(1):18.

PMID: 39354632 PMC: 11445994. DOI: 10.1186/s13326-024-00319-w.


Unsupervised literature mining approaches for extracting relationships pertaining to habitats and reproductive conditions of plant species.

Gabud R, Lapitan P, Mariano V, Mendoza E, Pampolina N, Clarino M Front Artif Intell. 2024; 7:1371411.

PMID: 38845683 PMC: 11153722. DOI: 10.3389/frai.2024.1371411.


A marker-based neural network system for extracting social determinants of health.

Zhao X, Rios A J Am Med Inform Assoc. 2023; 30(8):1398-1407.

PMID: 37011635 PMC: 10354756. DOI: 10.1093/jamia/ocad041.


A hybrid algorithm for clinical decision support in precision medicine based on machine learning.

Zhang Z, Lin X, Wu S BMC Bioinformatics. 2023; 24(1):3.

PMID: 36597033 PMC: 9811720. DOI: 10.1186/s12859-022-05116-9.

References
1.
Song M, Kim W, Lee D, Heo G, Kang K . PKDE4J: Entity and relation extraction for public knowledge discovery. J Biomed Inform. 2015; 57:320-32. DOI: 10.1016/j.jbi.2015.08.008. View

2.
Kilicoglu H, Rosemblat G, Fiszman M, Shin D . Broad-coverage biomedical relation extraction with SemRep. BMC Bioinformatics. 2020; 21(1):188. PMC: 7222583. DOI: 10.1186/s12859-020-3517-7. View

3.
Kim B, Choi W, Lee H . A corpus of plant-disease relations in the biomedical domain. PLoS One. 2019; 14(8):e0221582. PMC: 6713337. DOI: 10.1371/journal.pone.0221582. View

4.
Fundel K, Kuffner R, Zimmer R . RelEx--relation extraction using dependency parse trees. Bioinformatics. 2006; 23(3):365-71. DOI: 10.1093/bioinformatics/btl616. View

5.
Li M, He Q, Yang C, Ma J, He F, Chen T . The protein-protein interaction ontology: for better representing and capturing the biological context of protein interaction. BMC Genomics. 2021; 22(Suppl 5):544. PMC: 8596923. DOI: 10.1186/s12864-021-07827-4. View