Natural Language Processing with Machine Learning Methods to Analyze Unstructured Patient-reported Outcomes Derived from Electronic Health Records: A Systematic Review
Overview
Authors
Affiliations
Objective: Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care.
Methods: We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs.
Results: Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP.
Conclusions: This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
Golder S, Xu D, OConnor K, Wang Y, Batra M, Hernandez G Drug Saf. 2025; 48(4):321-337.
PMID: 39786481 PMC: 11903561. DOI: 10.1007/s40264-024-01505-6.
The Frontiers of Smart Healthcare Systems.
Lin N, Paul R, Guerra S, Liu Y, Doulgeris J, Shi M Healthcare (Basel). 2024; 12(23).
PMID: 39684952 PMC: 11641075. DOI: 10.3390/healthcare12232330.
Scroggins J, Hulchafo I, Harkins S, Scharp D, Moen H, Davoudi A J Am Med Inform Assoc. 2024; 32(2):308-317.
PMID: 39569431 PMC: 11756426. DOI: 10.1093/jamia/ocae290.
Nursing Records Regarding Decision-Making in Cancer Supportive Care: A Retrospective Study in Japan.
Kawasaki Y, Nii M, Nishioka E Healthc Inform Res. 2024; 30(4):364-374.
PMID: 39551923 PMC: 11570663. DOI: 10.4258/hir.2024.30.4.364.
Arigo D, Jake-Schoffman D, Pagoto S J Behav Med. 2024; 48(1):120-136.
PMID: 39467924 PMC: 11893649. DOI: 10.1007/s10865-024-00526-x.