DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction

Overview

Journal JCO Clin Cancer Inform

Specialty Medical Informatics

Date 2023 Dec 19

PMID 38113411

Authors

Harry Hochheiser

Sean Finan

Zhou Yuan

Eric B Durbin

Jong Cheol Jeong

Isaac Hands

David Rust

Ramakanth Kavuluru

Xiao-Cheng Wu

Jeremy L Warner

Guergana Savova

Affiliations

Soon will be listed here.

Abstract

Purpose: Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting.

Methods: We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools.

Results: API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool.

Conclusion: The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.

References

Wang L, Fu S, Wen A, Ruan X, He H, Liu S . Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform. 2022; 6:e2200006. PMC: 9470142. DOI: 10.1200/CCI.22.00006. View

Alawad M, Gao S, Qiu J, Schaefferkoetter N, Hinkle J, Yoon H . Deep Transfer Learning Across Cancer Registries for Information Extraction from Pathology Reports. IEEE EMBS Int Conf Biomed Health Inform. 2022; 2019. PMC: 9450101. DOI: 10.1109/bhi.2019.8834586. View

Zeng J, Banerjee I, Henry A, Wood D, Shachter R, Gensheimer M . Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records. JCO Clin Cancer Inform. 2021; 5:379-393. DOI: 10.1200/CCI.20.00173. View

Alawad M, Gao S, Qiu J, Yoon H, Christian J, Penberthy L . Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks. J Am Med Inform Assoc. 2019; 27(1):89-98. PMC: 7489089. DOI: 10.1093/jamia/ocz153. View

Savova G, Masanz J, Ogren P, Zheng J, Sohn S, Kipper-Schuler K . Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010; 17(5):507-13. PMC: 2995668. DOI: 10.1136/jamia.2009.001560. View

Rios A, Durbin E, Hands I, Kavuluru R . Assigning ICD-O-3 Codes to Pathology Reports using Neural Multi-Task Training with Hierarchical Regularization. ACM BCB. 2021; 2021. PMC: 8445227. DOI: 10.1145/3459930.3469541. View

Savova G, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O . DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res. 2017; 77(21):e115-e118. PMC: 5690492. DOI: 10.1158/0008-5472.CAN-17-0615. View

Karimi Y, Blayney D, Kurian A, Shen J, Yamashita R, Rubin D . Development and Use of Natural Language Processing for Identification of Distant Cancer Recurrence and Sites of Distant Recurrence Using Unstructured Electronic Health Record Data. JCO Clin Cancer Inform. 2021; 5:469-478. PMC: 8462655. DOI: 10.1200/CCI.20.00165. View

Yoon H, Peluso A, Durbin E, Wu X, Stroup A, Doherty J . Automatic information extraction from childhood cancer pathology reports. JAMIA Open. 2022; 5(2):ooac049. PMC: 9202570. DOI: 10.1093/jamiaopen/ooac049. View

10.

Yoon H, Klasky H, Gounley J, Alawad M, Gao S, Durbin E . Accelerated training of bootstrap aggregation-based deep information extraction systems from cancer pathology reports. J Biomed Inform. 2020; 110:103564. PMC: 8276580. DOI: 10.1016/j.jbi.2020.103564. View

11.

Savova G, Danciu I, Alamudun F, Miller T, Lin C, Bitterman D . Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res. 2019; 79(21):5463-5470. PMC: 7227798. DOI: 10.1158/0008-5472.CAN-19-0579. View