» Articles » PMID: 35849818

BioRED: a Rich Biomedical Relation Extraction Dataset

Overview
Journal Brief Bioinform
Specialty Biology
Date 2022 Jul 18
PMID 35849818
Authors
Affiliations
Soon will be listed here.
Abstract

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.

Citing Articles

Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.

Bhushan R, Donthi R, Chilukuri Y, Srinivasarao U, Swetha P BMC Bioinformatics. 2025; 26(1):34.

PMID: 39885428 PMC: 11780922. DOI: 10.1186/s12859-024-06008-w.


BioGSF: a graph-driven semantic feature integration framework for biomedical relation extraction.

Yang Y, Zheng Z, Xu Y, Wei H, Yan W Brief Bioinform. 2025; 26(1).

PMID: 39853110 PMC: 11759886. DOI: 10.1093/bib/bbaf025.


JTIS: enhancing biomedical document-level relation extraction through joint training with intermediate steps.

Li J, Pan D, Yang Z, Sun Y, Lin H, Wang J Database (Oxford). 2024; 2024.

PMID: 39700498 PMC: 11658465. DOI: 10.1093/database/baae125.


Biomedical relation extraction method based on ensemble learning and attention mechanism.

Jia Y, Wang H, Yuan Z, Zhu L, Xiang Z BMC Bioinformatics. 2024; 25(1):333.

PMID: 39425010 PMC: 11488084. DOI: 10.1186/s12859-024-05951-y.


CoNECo: a Corpus for Named Entity recognition and normalization of protein Complexes.

Nastou K, Koutrouli M, Pyysalo S, Jensen L Bioinform Adv. 2024; 4(1):vbae116.

PMID: 39411448 PMC: 11474106. DOI: 10.1093/bioadv/vbae116.