» Articles » PMID: 25725061

Automatic Concept Recognition Using the Human Phenotype Ontology Reference and Test Suite Corpora

Overview
Specialty Biology
Date 2015 Mar 1
PMID 25725061
Citations 37
Authors
Affiliations
Soon will be listed here.
Abstract

Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL: http://bio-lark.org/hpo_res.html.

Citing Articles

Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval Augmented Generation.

Garcia B, Westerfield L, Yelemali P, Gogate N, Andres Rivera-Munoz E, Du H medRxiv. 2024; .

PMID: 39677442 PMC: 11643181. DOI: 10.1101/2024.12.01.24318253.


FastHPOCR: pragmatic, fast, and accurate concept recognition using the human phenotype ontology.

Groza T, Gration D, Baynam G, Robinson P Bioinformatics. 2024; 40(7).

PMID: 38913850 PMC: 11227366. DOI: 10.1093/bioinformatics/btae406.


An evaluation of GPT models for phenotype concept recognition.

Groza T, Caufield H, Gration D, Baynam G, Haendel M, Robinson P BMC Med Inform Decis Mak. 2024; 24(1):30.

PMID: 38297371 PMC: 10829255. DOI: 10.1186/s12911-024-02439-w.


Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.

Yang J, Liu C, Deng W, Wu D, Weng C, Zhou Y Patterns (N Y). 2024; 5(1):100887.

PMID: 38264716 PMC: 10801236. DOI: 10.1016/j.patter.2023.100887.


Term-BLAST-like alignment tool for concept recognition in noisy clinical texts.

Groza T, Wu H, Dinger M, Danis D, Hilton C, Bagley A Bioinformatics. 2023; 39(12).

PMID: 38001031 PMC: 10710372. DOI: 10.1093/bioinformatics/btad716.


References
1.
Kohler S, Schulz M, Krawitz P, Bauer S, Dolken S, Ott C . Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet. 2009; 85(4):457-64. PMC: 2756558. DOI: 10.1016/j.ajhg.2009.09.003. View

2.
Uzuner O, South B, Shen S, DuVall S . 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc. 2011; 18(5):552-6. PMC: 3168320. DOI: 10.1136/amiajnl-2011-000203. View

3.
Groza T, Hunter J, Zankl A . Mining skeletal phenotype descriptions from scientific literature. PLoS One. 2013; 8(2):e55656. PMC: 3568099. DOI: 10.1371/journal.pone.0055656. View

4.
Whetzel P, Noy N, Shah N, Alexander P, Nyulas C, Tudorache T . BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res. 2011; 39(Web Server issue):W541-5. PMC: 3125807. DOI: 10.1093/nar/gkr469. View

5.
Robinson P, Kohler S, Oellrich A, Wang K, Mungall C, Lewis S . Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 2013; 24(2):340-8. PMC: 3912424. DOI: 10.1101/gr.160325.113. View