A Series of Natural Language Processing for Predicting Tumor Response Evaluation and Survival Curve from Electronic Health Records

Overview

Journal BMC Med Inform Decis Mak

Publisher Biomed Central

Specialty Medical Informatics

Date 2025 Feb 18

PMID 39962486

Authors

Toshiki Takeuchi

Hidehito Horinouchi

Ken Takasawa

Masami Mukai

Ken Masuda

Yuki Shinno

Yusuke Okuma

Tatsuya Yoshida

Yasushi Goto

Noboru Yamamoto

Yuichiro Ohe

Mototaka Miyake

Hirokazu Watanabe

Masahiko Kusumoto

Takashi Aoki

Kunihiro Nishimura

Ryuji Hamamoto

Affiliations

Soon will be listed here.

Abstract

Background: The clinical information housed within unstructured electronic health records (EHRs) has the potential to promote cancer research. The National Cancer Center Hospital (NCCH) is widely recognized as a leading institution for the treatment of thoracic malignancies in Japan. Information on medical treatment, particularly the characteristics of malignant tumors that occur in patients, tumor response evaluation, and adverse events, was compiled into the databases of each NCCH department from EHRs. However, there have been few opportunities for integrated analysis of data on both the hospital and research institute.

Methods: We developed a method for predicting tumor response evaluation and survival curves of drug therapy from the EHRs of lung cancer patients using natural language processing. First, we developed a rule-based algorithm to predict treatment duration using a dictionary of anticancer drugs and regimens used for lung cancer treatment. Thereafter, we applied supervised learning to radiology reports during each treatment period and constructed a classification model to predict the tumor response evaluation of anticancer drugs and date when the progressive disease (PD) was determined. The predicted response and PD date can be used to draw a survival curve for the progression-free survival.

Results: We used the EHRs of 716 lung cancer treatments at the NCCH and structured data of the cases as labels for the training and testing of supervised learning. The structured data were manually curated by physicians and CRCs. We investigated the results and performance of the proposed method. Individual predictions of tumor response evaluation and PD date were not extremely high. However, the final predicted survival curves were nearly similar to the actual survival curves.

Conclusions: Although it is difficult to construct a fully automated system using our method, we believe that it achieves sufficient performance for supporting physicians and CRCs constructing the database and providing clinical information to help researchers find out a chance of clinical studies.

References

Cai T, Zhang L, Yang N, Kumamaru K, Rybicki F, Cai T . EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Med Inform Decis Mak. 2019; 19(1):226. PMC: 6858776. DOI: 10.1186/s12911-019-0970-1. View

Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J . High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019; 14(12):3426-3444. PMC: 7323894. DOI: 10.1038/s41596-019-0227-6. View

Yuan Q, Cai T, Hong C, Du M, Johnson B, Lanuti M . Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer. JAMA Netw Open. 2021; 4(7):e2114723. PMC: 8264641. DOI: 10.1001/jamanetworkopen.2021.14723. View

Hotte S, Bjarnason G, Heng D, Jewett M, Kapoor A, Kollmannsberger C . Progression-free survival as a clinical trial endpoint in advanced renal cell carcinoma. Curr Oncol. 2011; 18 Suppl 2:S11-9. PMC: 3176905. DOI: 10.3747/co.v18is2.941. View

Liao K, Cai T, Savova G, Murphy S, Karlson E, Ananthakrishnan A . Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015; 350:h1885. PMC: 4707569. DOI: 10.1136/bmj.h1885. View

Gill S, Berry S, Biagi J, Butts C, Buyse M, Chen E . Progression-free survival as a primary endpoint in clinical trials of metastatic colorectal cancer. Curr Oncol. 2011; 18 Suppl 2:S5-S10. PMC: 3176908. DOI: 10.3747/co.v18is2.941. View

Zeng Z, Deng Y, Li X, Naumann T, Luo Y . Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM Trans Comput Biol Bioinform. 2018; 16(1):139-153. PMC: 6388621. DOI: 10.1109/TCBB.2018.2849968. View

Saad E, Katz A, Hoff P, Buyse M . Progression-free survival as surrogate and as true end point: insights from the breast and colorectal cancer literature. Ann Oncol. 2009; 21(1):7-12. DOI: 10.1093/annonc/mdp523. View

Wishart D, Knox C, Guo A, Shrivastava S, Hassanali M, Stothard P . DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2005; 34(Database issue):D668-72. PMC: 1347430. DOI: 10.1093/nar/gkj067. View

10.

Banda J, Seneviratne M, Hernandez-Boussard T, Shah N . Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. Annu Rev Biomed Data Sci. 2019; 1:53-68. PMC: 6583807. DOI: 10.1146/annurev-biodatasci-080917-013315. View

11.

Demner-Fushman D, Chapman W, McDonald C . What can natural language processing do for clinical decision support?. J Biomed Inform. 2009; 42(5):760-72. PMC: 2757540. DOI: 10.1016/j.jbi.2009.08.007. View

12.

Eisenhauer E, Therasse P, Bogaerts J, Schwartz L, Sargent D, Ford R . New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2008; 45(2):228-47. DOI: 10.1016/j.ejca.2008.10.026. View

13.

Kehl K, Elmarakeby H, Nishino M, Van Allen E, Lepisto E, Hassett M . Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. JAMA Oncol. 2019; 5(10):1421-1429. PMC: 6659158. DOI: 10.1001/jamaoncol.2019.1800. View

14.

Araki K, Matsumoto N, Togo K, Yonemoto N, Ohki E, Xu L . Developing Artificial Intelligence Models for Extracting Oncologic Outcomes from Japanese Electronic Health Records. Adv Ther. 2022; 40(3):934-950. PMC: 9988800. DOI: 10.1007/s12325-022-02397-7. View

15.

Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

16.

Kumamaru K, Saboo S, Aghayev A, Cai P, Quesada C, George E . CT pulmonary angiography-based scoring system to predict the prognosis of acute pulmonary embolism. J Cardiovasc Comput Tomogr. 2016; 10(6):473-479. DOI: 10.1016/j.jcct.2016.08.007. View

17.

Haltaufderheide J, Ranisch R . The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med. 2024; 7(1):183. PMC: 11231310. DOI: 10.1038/s41746-024-01157-x. View