» Articles » PMID: 35714301

Deep Learning for Cancer Symptoms Monitoring on the Basis of Electronic Health Record Unstructured Clinical Notes

Abstract

Purpose: Symptoms are vital outcomes for cancer clinical trials, observational research, and population-level surveillance. Patient-reported outcomes (PROs) are valuable for monitoring symptoms, yet there are many challenges to collecting PROs at scale. We sought to develop, test, and externally validate a deep learning model to extract symptoms from unstructured clinical notes in the electronic health record.

Methods: We randomly selected 1,225 outpatient progress notes from among patients treated at the Dana-Farber Cancer Institute between January 2016 and December 2019 and used 1,125 notes as our training/validation data set and 100 notes as our test data set. We evaluated the performance of 10 deep learning models for detecting 80 symptoms included in the National Cancer Institute's Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework. Model performance as compared with manual chart abstraction was assessed using standard metrics, and the highest performer was externally validated on a sample of 100 physician notes from a different clinical context.

Results: In our training and test data sets, 75 of the 80 candidate symptoms were identified. The ELECTRA-small model had the highest performance for symptom identification at the token level (ie, at the individual symptom level), with an F1 of 0.87 and a processing time of 3.95 seconds per note. For the 10 most common symptoms in the test data set, the F1 score ranged from 0.98 for anxious to 0.86 for fatigue. For external validation of the same symptoms, the note-level performance ranged from F1 = 0.97 for diarrhea and dizziness to F1 = 0.73 for swelling.

Conclusion: Training a deep learning model to identify a wide range of electronic health record-documented symptoms relevant to cancer care is feasible. This approach could be used at the health system scale to complement to electronic PROs.

Citing Articles

Detection of differences in physical symptoms between depressed and undepressed patients with breast cancer: a study using K-medoids clustering.

Tang J, Guo B, Zhong C, Chi J, Fu J, Lai J BMC Cancer. 2025; 25(1):23.

PMID: 39773474 PMC: 11708193. DOI: 10.1186/s12885-024-13387-z.


AI-Driven Prediction of Symptom Trajectories in Cancer Care: A Deep Learning Approach for Chemotherapy Management.

Finkelstein J, Smiley A, Echeverria C, Mooney K Bioengineering (Basel). 2024; 11(11).

PMID: 39593830 PMC: 11592055. DOI: 10.3390/bioengineering11111172.


CACER: Clinical concept Annotations for Cancer Events and Relations.

Fu Y, Ramachandran G, Halwani A, McInnes B, Xia F, Lybarger K J Am Med Inform Assoc. 2024; 31(11):2583-2594.

PMID: 39225779 PMC: 11491616. DOI: 10.1093/jamia/ocae231.


Assessing Real-World Data From Electronic Health Records for Health Technology Assessment: The SUITABILITY Checklist: A Good Practices Report of an ISPOR Task Force.

Fleurence R, Kent S, Adamson B, Tcheng J, Balicer R, Ross J Value Health. 2024; 27(6):692-701.

PMID: 38871437 PMC: 11182651. DOI: 10.1016/j.jval.2024.01.019.


Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns.

Swaminathan A, Ren A, Wu J, Bhargava-Shah A, Lopez I, Srivastava U JCO Clin Cancer Inform. 2024; 8:e2300091.

PMID: 38857465 PMC: 11371099. DOI: 10.1200/CCI.23.00091.


References
1.
Kehl K, Xu W, Lepisto E, Elmarakeby H, Hassett M, Van Allen E . Natural Language Processing to Ascertain Cancer Outcomes From Medical Oncologist Notes. JCO Clin Cancer Inform. 2020; 4:680-690. PMC: 7469582. DOI: 10.1200/CCI.20.00020. View

2.
Yeung A, Pugh S, Klopp A, Gil K, Wenzel L, Westin S . Improvement in Patient-Reported Outcomes With Intensity-Modulated Radiotherapy (RT) Compared With Standard RT: A Report From the NRG Oncology RTOG 1203 Study. J Clin Oncol. 2020; 38(15):1685-1692. PMC: 7238486. DOI: 10.1200/JCO.19.02381. View

3.
Wang X, Hripcsak G, Markatou M, Friedman C . Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009; 16(3):328-37. PMC: 2732239. DOI: 10.1197/jamia.M3028. View

4.
Basch E, Deal A, Kris M, Scher H, Hudis C, Sabbatini P . Symptom Monitoring With Patient-Reported Outcomes During Routine Cancer Treatment: A Randomized Controlled Trial. J Clin Oncol. 2015; 34(6):557-65. PMC: 4872028. DOI: 10.1200/JCO.2015.63.0830. View

5.
Bubis L, Davis L, Mahar A, Barbera L, Li Q, Moody L . Symptom Burden in the First Year After Cancer Diagnosis: An Analysis of Patient-Reported Outcomes. J Clin Oncol. 2018; 36(11):1103-1111. DOI: 10.1200/JCO.2017.76.0876. View