» Articles » PMID: 30666476

OGER++: Hybrid Multi-type Entity Recognition

Overview
Journal J Cheminform
Publisher Biomed Central
Specialty Chemistry
Date 2019 Jan 23
PMID 30666476
Citations 18
Authors
Affiliations
Soon will be listed here.
Abstract

Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step.

Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively.

Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Citing Articles

Information extraction from green channel textual records on expressways using hybrid deep learning.

Chen J, Zhang J, Tao W, Jin Y, Fan H Sci Rep. 2024; 14(1):31269.

PMID: 39732976 PMC: 11682078. DOI: 10.1038/s41598-024-82681-4.


Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological Study.

Majdik Z, Graham S, Edward J, Rodriguez S, Karnes M, Jensen J JMIR AI. 2024; 3:e52095.

PMID: 38875593 PMC: 11140272. DOI: 10.2196/52095.


PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies.

Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F Genome Med. 2024; 16(1):56.

PMID: 38627848 PMC: 11020195. DOI: 10.1186/s13073-024-01330-7.


Cancer-Alterome: a literature-mined resource for regulatory events caused by genetic alterations in cancer.

Yao X, He Z, Liu Y, Wang Y, Ouyang S, Xia J Sci Data. 2024; 11(1):265.

PMID: 38431735 PMC: 10908799. DOI: 10.1038/s41597-024-03083-9.


Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

Caufield J, Hegde H, Emonet V, Harris N, Joachimiak M, Matentzoglu N Bioinformatics. 2024; 40(3).

PMID: 38383067 PMC: 10924283. DOI: 10.1093/bioinformatics/btae104.


References
1.
Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J . Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000; 25(1):25-9. PMC: 3037419. DOI: 10.1038/75556. View

2.
Lipscomb C . Medical Subject Headings (MeSH). Bull Med Libr Assoc. 2000; 88(3):265-6. PMC: 35238. View

3.
Narayanaswamy M, Ravikumar K, Vijay-Shanker K . A biological named entity recognizer. Pac Symp Biocomput. 2003; :427-38. DOI: 10.1142/9789812776303_0040. View

4.
Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R . The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6(5):R44. PMC: 1175956. DOI: 10.1186/gb-2005-6-5-r44. View

5.
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A . ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2007; 36(Database issue):D344-50. PMC: 2238832. DOI: 10.1093/nar/gkm791. View