DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records

Overview

Journal Cancer Res

Specialty Oncology

Date 2017 Nov 3

PMID 29092954

Citations 42

Authors

Guergana K Savova

Eugene Tseytlin

Sean Finan

Melissa Castine

Timothy Miller

Olga Medvedeva

David Harris

Harry Hochheiser

Chen Lin

Girish Chavan

Rebecca S Jacobson

Affiliations

Soon will be listed here.

Abstract

Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem. The DeepPhe software enables automated extraction of detailed phenotype information from electronic medical records of cancer patients. The system implements advanced Natural Language Processing and knowledge engineering methods within a flexible modular architecture, and was evaluated using a manually annotated dataset of the University of Pittsburgh Medical Center breast cancer patients. The resulting platform provides critical and missing computational methods for computational phenotyping. Working in tandem with advanced analysis of high-throughput sequencing, these approaches will further accelerate the transition to precision cancer treatment. .

Citing Articles

Harnessing explainable artificial intelligence for patient-to-clinical-trial matching: A proof-of-concept pilot study using phase I oncology trials.

Ghosh S, Abushukair H, Ganesan A, Pan C, Naqash A, Lu K PLoS One. 2024; 19(10):e0311510.

PMID: 39446771 PMC: 11500892. DOI: 10.1371/journal.pone.0311510.

Artificial intelligence methods available for cancer research.

Murmu A, Gyorffy B Front Med. 2024; 18(5):778-797.

PMID: 39115792 DOI: 10.1007/s11684-024-1085-3.

Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods.

Bhattarai K, Oh I, Sierra J, Tang J, Payne P, Abrams Z JAMIA Open. 2024; 7(3):ooae060.

PMID: 38962662 PMC: 11221943. DOI: 10.1093/jamiaopen/ooae060.

Applications of natural language processing tools in the surgical journey.

Le K, Tay S, Choy K, Verjans J, Sasanelli N, Kong J Front Surg. 2024; 11():1403540.

PMID: 38826809 PMC: 11140056. DOI: 10.3389/fsurg.2024.1403540.

DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction.

Hochheiser H, Finan S, Yuan Z, Durbin E, Jeong J, Hands I JCO Clin Cancer Inform. 2023; 7:e2300156.

PMID: 38113411 PMC: 10752457. DOI: 10.1200/CCI.23.00156.

References

Hripcsak G, Rothschild A . Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005; 12(3):296-8. PMC: 1090460. DOI: 10.1197/jamia.M1733. View

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

Wu S, Kaggal V, Dligach D, Masanz J, Chen P, Becker L . A common type system for clinical natural language processing. J Biomed Semantics. 2013; 4(1):1. PMC: 3575354. DOI: 10.1186/2041-1480-4-1. View

Hochheiser H, Castine M, Harris D, Savova G, Jacobson R . An information model for computable cancer phenotypes. BMC Med Inform Decis Mak. 2016; 16(1):121. PMC: 5024416. DOI: 10.1186/s12911-016-0358-4. View