» Articles » PMID: 30687797

Natural Language Generation for Electronic Health Records

Overview
Journal NPJ Digit Med
Date 2019 Jan 29
PMID 30687797
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative electronic health records (EHRs) to facilitate data sharing between healthcare providers and researchers and promote methodological research. A variety of methods existing for generating synthetic EHRs, but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness, or progress notes. Here, we use the encoder-decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that appears to preserve the epidemiological information encoded in the original record-sentence pairs. As a side effect of the model's optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally identifiable information (PII) that was in the training data, suggesting that this model may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, allowing healthcare providers to share faithful representations of multimodal medical data without compromising patient privacy. This is an important advance that we hope will facilitate the development of machine-learning methods for clinical decision support, disease surveillance, and other data-hungry applications in biomedical informatics.

Citing Articles

Synthetic data generation methods in healthcare: A review on open-source tools and methods.

Pezoulas V, Zaridis D, Mylona E, Androutsos C, Apostolidis K, Tachos N Comput Struct Biotechnol J. 2024; 23:2892-2910.

PMID: 39108677 PMC: 11301073. DOI: 10.1016/j.csbj.2024.07.005.


Neural Models for Generating Natural Language Summaries from Temporal Personal Health Data.

Harris J, Zaki M J Healthc Inform Res. 2024; 8(2):370-399.

PMID: 38681757 PMC: 11052757. DOI: 10.1007/s41666-023-00158-x.


AI-assisted literature exploration of innovative Chinese medicine formulas.

Chung M, Su L, Chen C, Wu L Front Pharmacol. 2024; 15:1347882.

PMID: 38584602 PMC: 10995307. DOI: 10.3389/fphar.2024.1347882.


Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model.

Theodorou B, Xiao C, Sun J Nat Commun. 2023; 14(1):5305.

PMID: 37652934 PMC: 10471716. DOI: 10.1038/s41467-023-41093-0.


ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model.

Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S Int J Oral Sci. 2023; 15(1):29.

PMID: 37507396 PMC: 10382494. DOI: 10.1038/s41368-023-00239-y.


References
1.
Lee S, Levin D, Finley P, Heilig C . Chief complaint classification with recurrent neural networks. J Biomed Inform. 2019; 93:103158. PMC: 10563436. DOI: 10.1016/j.jbi.2019.103158. View

2.
Thomas M, Yoon P, Collins J, Davidson A, Mac Kenzie W . Evaluation of Syndromic Surveillance Systems in 6 US State and Local Health Departments. J Public Health Manag Pract. 2017; 24(3):235-240. PMC: 6198818. DOI: 10.1097/PHH.0000000000000679. View

3.
Burns E, Kakara R . Deaths from Falls Among Persons Aged ≥65 Years - United States, 2007-2016. MMWR Morb Mortal Wkly Rep. 2018; 67(18):509-514. PMC: 5944976. DOI: 10.15585/mmwr.mm6718a1. View

4.
Hochreiter S, Schmidhuber J . Long short-term memory. Neural Comput. 1997; 9(8):1735-80. DOI: 10.1162/neco.1997.9.8.1735. View

5.
Lall R, Abdelnabi J, Ngai S, Parton H, Saunders K, Sell J . Advancing the Use of Emergency Department Syndromic Surveillance Data, New York City, 2012-2016. Public Health Rep. 2017; 132(1_suppl):23S-30S. PMC: 5676519. DOI: 10.1177/0033354917711183. View