Toward Information Extraction: Identifying Protein Names from Biological Papers
Overview
Affiliations
To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.
BioBBC: a multi-feature model that enhances the detection of biomedical entities.
Alamro H, Gojobori T, Essack M, Gao X Sci Rep. 2024; 14(1):7697.
PMID: 38565624 PMC: 10987643. DOI: 10.1038/s41598-024-58334-x.
Advancing entity recognition in biomedicine via instruction tuning of large language models.
Keloth V, Hu Y, Xie Q, Peng X, Wang Y, Zheng A Bioinformatics. 2024; 40(4).
PMID: 38514400 PMC: 11001490. DOI: 10.1093/bioinformatics/btae163.
A BERT-Span model for Chinese named entity recognition in rehabilitation medicine.
Zhong J, Xuan Z, Wang K, Cheng Z PeerJ Comput Sci. 2023; 9:e1535.
PMID: 37705622 PMC: 10495977. DOI: 10.7717/peerj-cs.1535.
Biomedical named entity recognition based on fusion multi-features embedding.
Li M, Yang H, Liu Y Technol Health Care. 2023; 31(S1):111-121.
PMID: 37038786 PMC: 10258877. DOI: 10.3233/THC-236011.
Zheng X, Du H, Luo X, Tong F, Song W, Zhao D BMC Bioinformatics. 2022; 23(1):501.
PMID: 36418937 PMC: 9682683. DOI: 10.1186/s12859-022-05051-9.