» Articles » PMID: 39962486

A Series of Natural Language Processing for Predicting Tumor Response Evaluation and Survival Curve from Electronic Health Records

Abstract

Background: The clinical information housed within unstructured electronic health records (EHRs) has the potential to promote cancer research. The National Cancer Center Hospital (NCCH) is widely recognized as a leading institution for the treatment of thoracic malignancies in Japan. Information on medical treatment, particularly the characteristics of malignant tumors that occur in patients, tumor response evaluation, and adverse events, was compiled into the databases of each NCCH department from EHRs. However, there have been few opportunities for integrated analysis of data on both the hospital and research institute.

Methods: We developed a method for predicting tumor response evaluation and survival curves of drug therapy from the EHRs of lung cancer patients using natural language processing. First, we developed a rule-based algorithm to predict treatment duration using a dictionary of anticancer drugs and regimens used for lung cancer treatment. Thereafter, we applied supervised learning to radiology reports during each treatment period and constructed a classification model to predict the tumor response evaluation of anticancer drugs and date when the progressive disease (PD) was determined. The predicted response and PD date can be used to draw a survival curve for the progression-free survival.

Results: We used the EHRs of 716 lung cancer treatments at the NCCH and structured data of the cases as labels for the training and testing of supervised learning. The structured data were manually curated by physicians and CRCs. We investigated the results and performance of the proposed method. Individual predictions of tumor response evaluation and PD date were not extremely high. However, the final predicted survival curves were nearly similar to the actual survival curves.

Conclusions: Although it is difficult to construct a fully automated system using our method, we believe that it achieves sufficient performance for supporting physicians and CRCs constructing the database and providing clinical information to help researchers find out a chance of clinical studies.

References
1.
Cai T, Zhang L, Yang N, Kumamaru K, Rybicki F, Cai T . EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Med Inform Decis Mak. 2019; 19(1):226. PMC: 6858776. DOI: 10.1186/s12911-019-0970-1. View

2.
Zhang Y, Cai T, Yu S, Cho K, Hong C, Sun J . High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc. 2019; 14(12):3426-3444. PMC: 7323894. DOI: 10.1038/s41596-019-0227-6. View

3.
Yuan Q, Cai T, Hong C, Du M, Johnson B, Lanuti M . Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer. JAMA Netw Open. 2021; 4(7):e2114723. PMC: 8264641. DOI: 10.1001/jamanetworkopen.2021.14723. View

4.
Hotte S, Bjarnason G, Heng D, Jewett M, Kapoor A, Kollmannsberger C . Progression-free survival as a clinical trial endpoint in advanced renal cell carcinoma. Curr Oncol. 2011; 18 Suppl 2:S11-9. PMC: 3176905. DOI: 10.3747/co.v18is2.941. View

5.
Liao K, Cai T, Savova G, Murphy S, Karlson E, Ananthakrishnan A . Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015; 350:h1885. PMC: 4707569. DOI: 10.1136/bmj.h1885. View