Natural Language Processing with Machine Learning Methods to Analyze Unstructured Patient-reported Outcomes Derived from Electronic Health Records: A Systematic Review

Overview

Journal Artif Intell Med

Specialty Biomedical Engineering

Date 2023 Dec 2

PMID 38042599

Authors

Jin-Ah Sim

Xiaolei Huang

Madeline R Horan

Christopher M Stewart

Leslie L Robison

Melissa M Hudson

Justin N Baker

I-Chan Huang

Affiliations

Soon will be listed here.

Abstract

Objective: Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care.

Methods: We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs.

Results: Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP.

Conclusions: This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.

Citing Articles

Leveraging Natural Language Processing and Machine Learning Methods for Adverse Drug Event Detection in Electronic Health/Medical Records: A Scoping Review.

Golder S, Xu D, OConnor K, Wang Y, Batra M, Hernandez G Drug Saf. 2025; 48(4):321-337.

PMID: 39786481 PMC: 11903561. DOI: 10.1007/s40264-024-01505-6.

The Frontiers of Smart Healthcare Systems.

Lin N, Paul R, Guerra S, Liu Y, Doulgeris J, Shi M Healthcare (Basel). 2024; 12(23).

PMID: 39684952 PMC: 11641075. DOI: 10.3390/healthcare12232330.

Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing.

Scroggins J, Hulchafo I, Harkins S, Scharp D, Moen H, Davoudi A J Am Med Inform Assoc. 2024; 32(2):308-317.

PMID: 39569431 PMC: 11756426. DOI: 10.1093/jamia/ocae290.

Nursing Records Regarding Decision-Making in Cancer Supportive Care: A Retrospective Study in Japan.

Kawasaki Y, Nii M, Nishioka E Healthc Inform Res. 2024; 30(4):364-374.

PMID: 39551923 PMC: 11570663. DOI: 10.4258/hir.2024.30.4.364.

The recent history and near future of digital health in the field of behavioral medicine: an update on progress from 2019 to 2024.

Arigo D, Jake-Schoffman D, Pagoto S J Behav Med. 2024; 48(1):120-136.

PMID: 39467924 PMC: 11893649. DOI: 10.1007/s10865-024-00526-x.

References

Topaz M, Adams V, Wilson P, Woo K, Ryvicker M . Free-Text Documentation of Dementia Symptoms in Home Healthcare: A Natural Language Processing Study. Gerontol Geriatr Med. 2020; 6:2333721420959861. PMC: 7520927. DOI: 10.1177/2333721420959861. View

Jensen K, Soguero-Ruiz C, Mikalsen K, Lindsetmo R, Kouskoumvekaki I, Girolami M . Analysis of free text in electronic health records for identification of cancer patient trajectories. Sci Rep. 2017; 7:46226. PMC: 5384191. DOI: 10.1038/srep46226. View

Yang Z, Dehmer M, Yli-Harja O, Emmert-Streib F . Combining deep learning with token selection for patient phenotyping from electronic health records. Sci Rep. 2020; 10(1):1432. PMC: 6989657. DOI: 10.1038/s41598-020-58178-1. View

Sorup F, Eriksson R, Westergaard D, Hallas J, Brunak S, Andersen S . Sex differences in text-mined possible adverse drug events associated with drugs for psychosis. J Psychopharmacol. 2020; 34(5):532-539. DOI: 10.1177/0269881120903466. View

Fodeh S, Finch D, Bouayad L, Luther S, Ling H, Kerns R . Classifying clinical notes with pain assessment using machine learning. Med Biol Eng Comput. 2017; 56(7):1285-1292. PMC: 6014866. DOI: 10.1007/s11517-017-1772-1. View

Forsyth A, Barzilay R, Hughes K, Lui D, Lorenz K, Enzinger A . Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records. J Pain Symptom Manage. 2018; 55(6):1492-1499. DOI: 10.1016/j.jpainsymman.2018.02.016. View

Gaudet-Blavignac C, Foufi V, Bjelogrlic M, Lovis C . Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review. J Med Internet Res. 2021; 23(1):e24594. PMC: 7872838. DOI: 10.2196/24594. View

Heintzelman N, Taylor R, Simonsen L, Lustig R, Anderko D, Haythornthwaite J . Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text. J Am Med Inform Assoc. 2012; 20(5):898-905. PMC: 3756253. DOI: 10.1136/amiajnl-2012-001076. View

McCoy Jr T, Han L, Pellegrini A, Tanzi R, Berretta S, Perlis R . Stratifying risk for dementia onset using large-scale electronic health record data: A retrospective cohort study. Alzheimers Dement. 2019; 16(3):531-540. PMC: 7067642. DOI: 10.1016/j.jalz.2019.09.084. View

10.

Banerjee I, Li K, Seneviratne M, Ferrari M, Seto T, Brooks J . Weakly supervised natural language processing for assessing patient-centered outcome following prostate cancer treatment. JAMIA Open. 2019; 2(1):150-159. PMC: 6482003. DOI: 10.1093/jamiaopen/ooy057. View

11.

Steinkamp J, Bala W, Sharma A, Kantrowitz J . Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes. J Biomed Inform. 2019; 102:103354. DOI: 10.1016/j.jbi.2019.103354. View

12.

Hu B, Bajracharya A, Yu H . Generating Medical Assessments Using a Neural Network Model: Algorithm Development and Validation. JMIR Med Inform. 2020; 8(1):e14971. PMC: 7006435. DOI: 10.2196/14971. View

13.

Deleger L, Brodzinski H, Zhai H, Li Q, Lingren T, Kirkendall E . Developing and evaluating an automated appendicitis risk stratification algorithm for pediatric patients in the emergency department. J Am Med Inform Assoc. 2013; 20(e2):e212-20. PMC: 3861926. DOI: 10.1136/amiajnl-2013-001962. View

14.

Leiter R, Santus E, Jin Z, Lee K, Yusufov M, Chien I . Deep Natural Language Processing to Identify Symptom Documentation in Clinical Notes for Patients With Heart Failure Undergoing Cardiac Resynchronization Therapy. J Pain Symptom Manage. 2020; 60(5):948-958.e3. DOI: 10.1016/j.jpainsymman.2020.06.010. View

15.

Pakhomov S, Hemingway H, Weston S, Jacobsen S, Rodeheffer R, Roger V . Epidemiology of angina pectoris: role of natural language processing of the medical record. Am Heart J. 2007; 153(4):666-73. PMC: 1929015. DOI: 10.1016/j.ahj.2006.12.022. View

16.

Chan L, Beers K, Yau A, Chauhan K, Duffy A, Chaudhary K . Natural language processing of electronic health records is superior to billing codes to identify symptom burden in hemodialysis patients. Kidney Int. 2019; 97(2):383-392. PMC: 7001114. DOI: 10.1016/j.kint.2019.10.023. View

17.

Iqbal E, Mallah R, Rhodes D, Wu H, Romero A, Chang N . ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLoS One. 2017; 12(11):e0187121. PMC: 5679515. DOI: 10.1371/journal.pone.0187121. View

18.

Ajami S, Arab-Chadegani R . Barriers to implement Electronic Health Records (EHRs). Mater Sociomed. 2013; 25(3):213-5. PMC: 3804410. DOI: 10.5455/msm.2013.25.213-215. View

19.

Koleck T, Dreisbach C, Bourne P, Bakken S . Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review. J Am Med Inform Assoc. 2019; 26(4):364-379. PMC: 6657282. DOI: 10.1093/jamia/ocy173. View

20.

Chase H, Mitrani L, Lu G, Fulgieri D . Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017; 17(1):24. PMC: 5329909. DOI: 10.1186/s12911-017-0418-4. View