» Articles » PMID: 24571547

Large-scale Biomedical Concept Recognition: an Evaluation of Current Automatic Annotators and Their Parameters

Overview
Publisher Biomed Central
Specialty Biology
Date 2014 Feb 28
PMID 24571547
Citations 57
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Ontological concepts are useful for many different biomedical tasks. Concepts are difficult to recognize in text due to a disconnect between what is captured in an ontology and how the concepts are expressed in text. There are many recognizers for specific ontologies, but a general approach for concept recognition is an open problem.

Results: Three dictionary-based systems (MetaMap, NCBO Annotator, and ConceptMapper) are evaluated on eight biomedical ontologies in the Colorado Richly Annotated Full-Text (CRAFT) Corpus. Over 1,000 parameter combinations are examined, and best-performing parameters for each system-ontology pair are presented.

Conclusions: Baselines for concept recognition by three systems on eight biomedical ontologies are established (F-measures range from 0.14-0.83). Out of the three systems we tested, ConceptMapper is generally the best-performing system; it produces the highest F-measure of seven out of eight ontologies. Default parameters are not ideal for most systems on most ontologies; by changing parameters F-measure can be increased by up to 0.4. Not only are best performing parameters presented, but suggestions for choosing the best parameters based on ontology characteristics are presented.

Citing Articles

Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation.

Chen J, Goudey B, Geard N, Verspoor K Bioinformatics. 2024; 40(Suppl 1):i390-i400.

PMID: 38940182 PMC: 11256942. DOI: 10.1093/bioinformatics/btae246.


Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.

Malec S, Taneja S, Albert S, Shaaban C, Karim H, Levine A J Biomed Inform. 2023; 142:104368.

PMID: 37086959 PMC: 10355339. DOI: 10.1016/j.jbi.2023.104368.


Classifying literature mentions of biological pathogens as experimentally studied using natural language processing.

Jimeno Yepes A, Verspoor K J Biomed Semantics. 2023; 14(1):1.

PMID: 36721225 PMC: 9889128. DOI: 10.1186/s13326-023-00282-y.


Enhanced neurologic concept recognition using a named entity recognition model based on transformers.

Azizi S, Hier D, Wunsch Ii D Front Digit Health. 2022; 4:1065581.

PMID: 36569804 PMC: 9772022. DOI: 10.3389/fdgth.2022.1065581.


A Gated Recurrent Unit based architecture for recognizing ontology concepts from biological literature.

Devkota P, Mohanty S, Manda P BioData Min. 2022; 15(1):22.

PMID: 36171616 PMC: 9516808. DOI: 10.1186/s13040-022-00310-0.


References
1.
Verspoor K, Cohn J, Joslyn C, Mniszewski S, Rechtsteiner A, Rocha L . Protein annotation as term categorization in the gene ontology using word proximity networks. BMC Bioinformatics. 2005; 6 Suppl 1:S20. PMC: 1869013. DOI: 10.1186/1471-2105-6-S1-S20. View

2.
Eilbeck K, Lewis S, Mungall C, Yandell M, Stein L, Durbin R . The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6(5):R44. PMC: 1175956. DOI: 10.1186/gb-2005-6-5-r44. View

3.
Rocktaschel T, Weidlich M, Leser U . ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics. 2012; 28(12):1633-40. DOI: 10.1093/bioinformatics/bts183. View

4.
Cohen K, Palmer M, Hunter L . Nominalization and alternations in biomedical language. PLoS One. 2008; 3(9):e3158. PMC: 2527518. DOI: 10.1371/journal.pone.0003158. View

5.
Denny J, Smithers J, Miller R, Spickard 3rd A . "Understanding" medical school curriculum content using KnowledgeMap. J Am Med Inform Assoc. 2003; 10(4):351-62. PMC: 181986. DOI: 10.1197/jamia.M1176. View