» Articles » PMID: 30726935

Natural Language Processing of Symptoms Documented in Free-text Narratives of Electronic Health Records: a Systematic Review

Overview
Date 2019 Feb 7
PMID 30726935
Citations 143
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: Natural language processing (NLP) of symptoms from electronic health records (EHRs) could contribute to the advancement of symptom science. We aim to synthesize the literature on the use of NLP to process or analyze symptom information documented in EHR free-text narratives.

Materials And Methods: Our search of 1964 records from PubMed and EMBASE was narrowed to 27 eligible articles. Data related to the purpose, free-text corpus, patients, symptoms, NLP methodology, evaluation metrics, and quality indicators were extracted for each study.

Results: Symptom-related information was presented as a primary outcome in 14 studies. EHR narratives represented various inpatient and outpatient clinical specialties, with general, cardiology, and mental health occurring most frequently. Studies encompassed a wide variety of symptoms, including shortness of breath, pain, nausea, dizziness, disturbed sleep, constipation, and depressed mood. NLP approaches included previously developed NLP tools, classification methods, and manually curated rule-based processing. Only one-third (n = 9) of studies reported patient demographic characteristics.

Discussion: NLP is used to extract information from EHR free-text narratives written by a variety of healthcare providers on an expansive range of symptoms across diverse clinical specialties. The current focus of this field is on the development of methods to extract symptom information and the use of symptom information for disease classification tasks rather than the examination of symptoms themselves.

Conclusion: Future NLP studies should concentrate on the investigation of symptoms and symptom documentation in EHR free-text narratives. Efforts should be undertaken to examine patient characteristics and make symptom-related NLP algorithms or pipelines and vocabularies openly available.

Citing Articles

Real-World Insights Into Dementia Diagnosis Trajectory and Clinical Practice Patterns Unveiled by Natural Language Processing: Development and Usability Study.

Paek H, Fortinsky R, Lee K, Huang L, Maghaydah Y, Kuchel G JMIR Aging. 2025; 8:e65221.

PMID: 39999185 PMC: 11878476. DOI: 10.2196/65221.


Leveraging Large Language Models for Infectious Disease Surveillance-Using a Web Service for Monitoring COVID-19 Patterns From Self-Reporting Tweets: Content Analysis.

Xie J, Zhang Z, Zeng S, Hilliard J, An G, Tang X J Med Internet Res. 2025; 27:e63190.

PMID: 39977859 PMC: 11888100. DOI: 10.2196/63190.


Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters.

Carrillo-Larco R, Castillo-Cara M, Lovon-Melgarejo J Wellcome Open Res. 2025; 6:177.

PMID: 39931661 PMC: 11809155. DOI: 10.12688/wellcomeopenres.16867.3.


A foundation systematic review of natural language processing applied to gastroenterology & hepatology.

Stammers M, Ramgopal B, Owusu Nimako A, Vyas A, Nouraei R, Metcalf C BMC Gastroenterol. 2025; 25(1):58.

PMID: 39915703 PMC: 11800601. DOI: 10.1186/s12876-025-03608-5.


Scalable information extraction from free text electronic health records using large language models.

Gu B, Shao V, Liao Z, Carducci V, Brufau S, Yang J BMC Med Res Methodol. 2025; 25(1):23.

PMID: 39871166 PMC: 11773977. DOI: 10.1186/s12874-025-02470-z.


References
1.
McKiernan E, Bourne P, Brown C, Buck S, Kenall A, Lin J . How open science helps researchers succeed. Elife. 2016; 5. PMC: 4973366. DOI: 10.7554/eLife.16800. View

2.
Gundlapalli A, South B, Phansalkar S, Kinney A, Shen S, DeLisle S . Application of Natural Language Processing to VA Electronic Health Records to Identify Phenotypic Characteristics for Clinical and Research Purposes. Summit Transl Bioinform. 2011; 2008:36-40. PMC: 3041527. View

3.
Hyun S, Johnson S, Bakken S . Exploring the ability of natural language processing to extract data from nursing narratives. Comput Inform Nurs. 2009; 27(4):215-23. PMC: 4415266. DOI: 10.1097/NCN.0b013e3181a91b58. View

4.
Watson M . When will 'open science' become simply 'science'?. Genome Biol. 2015; 16:101. PMC: 4436110. DOI: 10.1186/s13059-015-0669-2. View

5.
Friedman C, Knirsch C, Shagina L, Hripcsak G . Automating a severity score guideline for community-acquired pneumonia employing medical language processing of discharge summaries. Proc AMIA Symp. 1999; :256-60. PMC: 2232753. View