Opportunities and Challenges in Developing Deep Learning Models Using Electronic Health Records Data: a Systematic Review

Overview

Journal J Am Med Inform Assoc

Publisher Oxford University Press

Specialty Medical Informatics

Date 2018 Jun 13

PMID 29893864

Citations 185

Authors

Cao Xiao

Edward Choi

Jimeng Sun

Affiliations

Soon will be listed here.

Abstract

Objective: To conduct a systematic review of deep learning models for electronic health record (EHR) data, and illustrate various deep learning architectures for analyzing different data sources and their target applications. We also highlight ongoing research and identify open challenges in building deep learning models of EHRs.

Design/method: We searched PubMed and Google Scholar for papers on deep learning studies using EHR data published between January 1, 2010, and January 31, 2018. We summarize them according to these axes: types of analytics tasks, types of deep learning model architectures, special challenges arising from health data and tasks and their potential solutions, as well as evaluation strategies.

Results: We surveyed and analyzed multiple aspects of the 98 articles we found and identified the following analytics tasks: disease detection/classification, sequential prediction of clinical events, concept embedding, data augmentation, and EHR data privacy. We then studied how deep architectures were applied to these tasks. We also discussed some special challenges arising from modeling EHR data and reviewed a few popular approaches. Finally, we summarized how performance evaluations were conducted for each task.

Discussion: Despite the early success in using deep learning for health analytics applications, there still exist a number of issues to be addressed. We discuss them in detail including data and label availability, the interpretability and transparency of the model, and ease of deployment.

Citing Articles

How AI can help us beat AMR.

Arnold A, McLellan S, Stokes J NPJ Antimicrob Resist. 2025; 3(1):18.

PMID: 40082590 PMC: 11906734. DOI: 10.1038/s44259-025-00085-4.

Machine learning techniques for predicting neurodevelopmental impairments in premature infants: a systematic review.

Ortega-Leon A, Urda D, Turias I, Lubian-Lopez S, Benavente-Fernandez I Front Artif Intell. 2025; 8:1481338.

PMID: 39906903 PMC: 11788297. DOI: 10.3389/frai.2025.1481338.

Discovering patient groups in sequential electronic healthcare data using unsupervised representation learning.

Li J, Zakka K, Booth J, Rigny L, Ray S, Cortina-Borja M BMC Med Inform Decis Mak. 2025; 25(1):45.

PMID: 39875929 PMC: 11776155. DOI: 10.1186/s12911-024-02812-9.

Augmented machine learning for sewage quality assessment with limited data.

Lv J, Yin W, Xu J, Cheng H, Li Z, Yang J Environ Sci Ecotechnol. 2024; 23:100512.

PMID: 39659704 PMC: 11629219. DOI: 10.1016/j.ese.2024.100512.

Research on Fine-Tuning Optimization Strategies for Large Language Models in Tabular Data Processing.

Zhao X, Leng X, Wang L, Wang N Biomimetics (Basel). 2024; 9(11).

PMID: 39590280 PMC: 11592316. DOI: 10.3390/biomimetics9110708.

References

Choi E, Bahadori M, Schuetz A, Stewart W, Sun J . Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. JMLR Workshop Conf Proc. 2017; 56:301-318. PMC: 5341604. View

Luo Y . Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform. 2017; 72:85-95. PMC: 6657689. DOI: 10.1016/j.jbi.2017.07.006. View

LeCun Y, Bengio Y, Hinton G . Deep learning. Nature. 2015; 521(7553):436-44. DOI: 10.1038/nature14539. View

Choi E, Bahadori M, Song L, Stewart W, Sun J . GRAM: Graph-based Attention Model for Healthcare Representation Learning. KDD. 2021; 2017:787-795. PMC: 7954122. DOI: 10.1145/3097983.3098126. View

Liu Y, Logan B, Liu N, Xu Z, Tang J, Wang Y . Deep Reinforcement Learning for Dynamic Treatment Regimes on Medical Registry Data. Healthc Inform. 2018; 2017:380-385. PMC: 5856473. DOI: 10.1109/ICHI.2017.45. View

Glicksberg B, Miotto R, Johnson K, Shameer K, Li L, Chen R . Automated disease cohort selection using word embeddings from Electronic Health Records. Pac Symp Biocomput. 2017; 23:145-156. PMC: 5788312. View

Angermueller C, Parnamaa T, Parts L, Stegle O . Deep learning for computational biology. Mol Syst Biol. 2016; 12(7):878. PMC: 4965871. DOI: 10.15252/msb.20156651. View

Litjens G, Kooi T, Bejnordi B, Setio A, Ciompi F, Ghafoorian M . A survey on deep learning in medical image analysis. Med Image Anal. 2017; 42:60-88. DOI: 10.1016/j.media.2017.07.005. View

Che Z, Purushotham S, Khemani R, Liu Y . Interpretable Deep Models for ICU Outcome Prediction. AMIA Annu Symp Proc. 2017; 2016:371-380. PMC: 5333206. View

10.

Johnson A, Pollard T, Shen L, Lehman L, Feng M, Ghassemi M . MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035. PMC: 4878278. DOI: 10.1038/sdata.2016.35. View

11.

Jagannatha A, Yu H . Bidirectional RNN for Medical Event Detection in Electronic Health Records. Proc Conf. 2016; 2016:473-482. PMC: 5119627. DOI: 10.18653/v1/n16-1056. View

12.

Tran T, Nguyen T, Phung D, Venkatesh S . Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). J Biomed Inform. 2015; 54:96-105. DOI: 10.1016/j.jbi.2015.01.012. View

13.

Huang Z, Dong W, Duan H, Liu J . A Regularized Deep Learning Approach for Clinical Risk Prediction of Acute Coronary Syndrome Using Electronic Health Records. IEEE Trans Biomed Eng. 2017; 65(5):956-968. DOI: 10.1109/TBME.2017.2731158. View

14.

Pham T, Tran T, Phung D, Venkatesh S . Predicting healthcare trajectories from medical records: A deep learning approach. J Biomed Inform. 2017; 69:218-229. DOI: 10.1016/j.jbi.2017.04.001. View

15.

Beaulieu-Jones B, Moore J . MISSING DATA IMPUTATION IN THE ELECTRONIC HEALTH RECORD USING DEEPLY LEARNED AUTOENCODERS. Pac Symp Biocomput. 2016; 22:207-218. PMC: 5144587. DOI: 10.1142/9789813207813_0021. View

16.

Gulshan V, Peng L, Coram M, Stumpe M, Wu D, Narayanaswamy A . Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016; 316(22):2402-2410. DOI: 10.1001/jama.2016.17216. View

17.

Beaulieu-Jones B, Greene C . Semi-supervised learning of the electronic health record for phenotype stratification. J Biomed Inform. 2016; 64:168-178. DOI: 10.1016/j.jbi.2016.10.007. View

18.

Du H, Ghassemi M, Feng M . The effects of deep network topology on mortality prediction. Annu Int Conf IEEE Eng Med Biol Soc. 2017; 2016:2602-2605. DOI: 10.1109/EMBC.2016.7591263. View

19.

Ching T, Himmelstein D, Beaulieu-Jones B, Kalinin A, Do B, Way G . Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018; 15(141). PMC: 5938574. DOI: 10.1098/rsif.2017.0387. View

20.

Goodwin T, Harabagiu S . Deep Learning from EEG Reports for Inferring Underspecified Information. AMIA Jt Summits Transl Sci Proc. 2017; 2017:112-121. PMC: 5543361. View