Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes

Overview

Journal JCO Clin Cancer Inform

Specialty Medical Informatics

Date 2022 Jun 17

PMID 35714301

Authors

Charlotta Lindvall

Chih-Ying Deng

Nicole D Agaronnik

Anne Kwok

Soujanya Samineni

Renato Umeton

Warren Mackie-Jenkins

Kenneth L Kehl

James A Tulsky

Andrea C Enzinger

Affiliations

Soon will be listed here.

Abstract

Purpose: Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record.

Methods: We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context.

Results: In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling.

Conclusion: Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.

Citing Articles

Detection of differences in physical symptoms between depressed and undepressed patients with breast cancer: a study using K-medoids clustering.

Tang J, Guo B, Zhong C, Chi J, Fu J, Lai J BMC Cancer. 2025; 25(1):23.

PMID: 39773474 PMC: 11708193. DOI: 10.1186/s12885-024-13387-z.

AI-Driven Prediction of Symptom Trajectories in Cancer Care: A Deep Learning Approach for Chemotherapy Management.

Finkelstein J, Smiley A, Echeverria C, Mooney K Bioengineering (Basel). 2024; 11(11).

PMID: 39593830 PMC: 11592055. DOI: 10.3390/bioengineering11111172.

CACER: Clinical concept Annotations for Cancer Events and Relations.

Fu Y, Ramachandran G, Halwani A, McInnes B, Xia F, Lybarger K J Am Med Inform Assoc. 2024; 31(11):2583-2594.

PMID: 39225779 PMC: 11491616. DOI: 10.1093/jamia/ocae231.

Assessing Real-World Data From Electronic Health Records for Health Technology Assessment: The SUITABILITY Checklist: A Good Practices Report of an ISPOR Task Force.

Fleurence R, Kent S, Adamson B, Tcheng J, Balicer R, Ross J Value Health. 2024; 27(6):692-701.

PMID: 38871437 PMC: 11182651. DOI: 10.1016/j.jval.2024.01.019.

Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns.

Swaminathan A, Ren A, Wu J, Bhargava-Shah A, Lopez I, Srivastava U JCO Clin Cancer Inform. 2024; 8:e2300091.

PMID: 38857465 PMC: 11371099. DOI: 10.1200/CCI.23.00091.

References

Kehl K, Xu W, Lepisto E, Elmarakeby H, Hassett M, Van Allen E . Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes. JCO Clin Cancer Inform. 2020; 4:680-690. PMC: 7469582. DOI: 10.1200/CCI.20.00020. View

Yeung A, Pugh S, Klopp A, Gil K, Wenzel L, Westin S . Improvement in Patient-Reported Outcomes With Intensity-Modulated Radiotherapy (RT) Compared With Standard RT: A Report From the NRG Oncology RTOG 1203 Study. J Clin Oncol. 2020; 38(15):1685-1692. PMC: 7238486. DOI: 10.1200/JCO.19.02381. View

Wang X, Hripcsak G, Markatou M, Friedman C . Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009; 16(3):328-37. PMC: 2732239. DOI: 10.1197/jamia.M3028. View

Basch E, Deal A, Kris M, Scher H, Hudis C, Sabbatini P . Symptom Monitoring With Patient-Reported Outcomes During Routine Cancer Treatment: A Randomized Controlled Trial. J Clin Oncol. 2015; 34(6):557-65. PMC: 4872028. DOI: 10.1200/JCO.2015.63.0830. View

Bubis L, Davis L, Mahar A, Barbera L, Li Q, Moody L . Symptom Burden in the First Year After Cancer Diagnosis: An Analysis of Patient-Reported Outcomes. J Clin Oncol. 2018; 36(11):1103-1111. DOI: 10.1200/JCO.2017.76.0876. View

Kluetz P, Chingos D, Basch E, Mitchell S . Patient-Reported Outcomes in Cancer Clinical Trials: Measuring Symptomatic Adverse Events With the National Cancer Institute's Patient-Reported Outcomes Version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). Am Soc Clin Oncol Educ Book. 2016; 35:67-73. DOI: 10.1200/EDBK_159514. View

Bruner D, Hanisch L, Reeve B, Trotti A, Schrag D, Sit L . Stakeholder perspectives on implementing the National Cancer Institute's patient-reported outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). Transl Behav Med. 2013; 1(1):110-22. PMC: 3717706. DOI: 10.1007/s13142-011-0025-3. View

Tamang S, Patel M, Blayney D, Kuznetsov J, Finlayson S, Vetteth Y . Detecting unplanned care from clinician notes in electronic health records. J Oncol Pract. 2015; 11(3):e313-9. PMC: 4438112. DOI: 10.1200/JOP.2014.002741. View

Lindvall C, Deng C, Moseley E, Agaronnik N, El-Jawahri A, Paasche-Orlow M . Natural Language Processing to Identify Advance Care Planning Documentation in a Multisite Pragmatic Clinical Trial. J Pain Symptom Manage. 2021; 63(1):e29-e36. PMC: 9124370. DOI: 10.1016/j.jpainsymman.2021.06.025. View

10.

Miyaji T, Iioka Y, Kuroda Y, Yamamoto D, Iwase S, Goto Y . Japanese translation and linguistic validation of the US National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). J Patient Rep Outcomes. 2018; 1(1):8. PMC: 5934908. DOI: 10.1186/s41687-017-0012-7. View

11.

Tang H, Solti I, Kirkendall E, Zhai H, Lingren T, Meller J . Leveraging Food and Drug Administration Adverse Event Reports for the Automated Monitoring of Electronic Health Records in a Pediatric Hospital. Biomed Inform Insights. 2017; 9:1178222617713018. PMC: 5467704. DOI: 10.1177/1178222617713018. View

12.

Savova G, Danciu I, Alamudun F, Miller T, Lin C, Bitterman D . Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res. 2019; 79(21):5463-5470. PMC: 7227798. DOI: 10.1158/0008-5472.CAN-19-0579. View

13.

Yim W, Yetisgen M, Harris W, Kwan S . Natural Language Processing in Oncology: A Review. JAMA Oncol. 2016; 2(6):797-804. DOI: 10.1001/jamaoncol.2016.0213. View

14.

Chan A, Chien I, Moseley E, Salman S, Kaminer Bourland S, Lamas D . Deep learning algorithms to identify documentation of serious illness conversations during intensive care unit admissions. Palliat Med. 2018; 33(2):187-196. DOI: 10.1177/0269216318810421. View

15.

Weissman G, Harhay M, Lugo R, Fuchs B, Halpern S, Mikkelsen M . Natural Language Processing to Assess Documentation of Features of Critical Illness in Discharge Documents of Acute Respiratory Distress Syndrome Survivors. Ann Am Thorac Soc. 2016; 13(9):1538-45. PMC: 5059499. DOI: 10.1513/AnnalsATS.201602-131OC. View

16.

Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N . ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLoS One. 2017; 12(11):e0187121. PMC: 5679515. DOI: 10.1371/journal.pone.0187121. View

17.

Kehl K, Elmarakeby H, Nishino M, Van Allen E, Lepisto E, Hassett M . Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. JAMA Oncol. 2019; 5(10):1421-1429. PMC: 6659158. DOI: 10.1001/jamaoncol.2019.1800. View

18.

Koleck T, Dreisbach C, Bourne P, Bakken S . Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019; 26(4):364-379. PMC: 6657282. DOI: 10.1093/jamia/ocy173. View

19.

Chase H, Mitrani L, Lu G, Fulgieri D . Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017; 17(1):24. PMC: 5329909. DOI: 10.1186/s12911-017-0418-4. View

20.

Atkinson T, Li Y, Coffey C, Sit L, Shaw M, Lavene D . Reliability of adverse symptom event reporting by clinicians. Qual Life Res. 2011; 21(7):1159-64. PMC: 3633532. DOI: 10.1007/s11136-011-0031-4. View