» Articles » PMID: 9697224

Toward Information Extraction: Identifying Protein Names from Biological Papers

Overview
Publisher World Scientific
Specialty Biology
Date 1998 Aug 11
PMID 9697224
Citations 82
Authors
Affiliations
Soon will be listed here.
Abstract

To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

Citing Articles

BioBBC: a multi-feature model that enhances the detection of biomedical entities.

Alamro H, Gojobori T, Essack M, Gao X Sci Rep. 2024; 14(1):7697.

PMID: 38565624 PMC: 10987643. DOI: 10.1038/s41598-024-58334-x.


Advancing entity recognition in biomedicine via instruction tuning of large language models.

Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A Bioinformatics. 2024; 40(4).

PMID: 38514400 PMC: 11001490. DOI: 10.1093/bioinformatics/btae163.


A BERT-Span model for Chinese named entity recognition in rehabilitation medicine.

Zhong J, Xuan Z, Wang K, Cheng Z PeerJ Comput Sci. 2023; 9:e1535.

PMID: 37705622 PMC: 10495977. DOI: 10.7717/peerj-cs.1535.


Biomedical named entity recognition based on fusion multi-features embedding.

Li M, Yang H, Liu Y Technol Health Care. 2023; 31(S1):111-121.

PMID: 37038786 PMC: 10258877. DOI: 10.3233/THC-236011.


BioByGANS: biomedical named entity recognition by fusing contextual and syntactic features through graph attention network in node classification framework.

Zheng X, Du H, Luo X, Tong F, Song W, Zhao D BMC Bioinformatics. 2022; 23(1):501.

PMID: 36418937 PMC: 9682683. DOI: 10.1186/s12859-022-05051-9.