» Articles » PMID: 33290879

Language Models Are an Effective Representation Learning Technique for Electronic Health Record Data

Overview
Journal J Biomed Inform
Publisher Elsevier
Date 2020 Dec 8
PMID 33290879
Citations 35
Authors
Affiliations
Soon will be listed here.
Abstract

Widespread adoption of electronic health records (EHRs) has fueled the development of using machine learning to build prediction models for various clinical outcomes. However, this process is often constrained by having a relatively small number of patient records for training the model. We demonstrate that using patient representation schemes inspired from techniques in natural language processing can increase the accuracy of clinical prediction models by transferring information learned from the entire patient population to the task of training a specific model, where only a subset of the population is relevant. Such patient representation schemes enable a 3.5% mean improvement in AUROC on five prediction tasks compared to standard baselines, with the average improvement rising to 19% when only a small number of patient records are available for training the clinical prediction model.

Citing Articles

A machine learning approach to leveraging electronic health records for enhanced omics analysis.

Mataraso S, Espinosa C, Seong D, Reincke S, Berson E, Reiss J Nat Mach Intell. 2025; 7(2):293-306.

PMID: 40008295 PMC: 11847705. DOI: 10.1038/s42256-024-00974-9.


A roadmap to implementing machine learning in healthcare: from concept to practice.

Yan A, Guo L, Inoue J, Arciniegas S, Vettese E, Wolochacz A Front Digit Health. 2025; 7:1462751.

PMID: 39906065 PMC: 11788154. DOI: 10.3389/fdgth.2025.1462751.


Developing a Research Center for Artificial Intelligence in Medicine.

Langlotz C, Kim J, Shah N, Lungren M, Larson D, Datta S Mayo Clin Proc Digit Health. 2025; 2(4):677-686.

PMID: 39802660 PMC: 11720458. DOI: 10.1016/j.mcpdig.2024.07.005.


Unified Clinical Vocabulary Embeddings for Advancing Precision Medicine.

Johnson R, Gottlieb U, Shaham G, Eisen L, Waxman J, Devons-Sberro S medRxiv. 2024; .

PMID: 39677476 PMC: 11643188. DOI: 10.1101/2024.12.03.24318322.


Debiasing large language models: research opportunities.

Yogarajan V, Dobbie G, Keegan T J R Soc N Z. 2024; 55(2):372-395.

PMID: 39677375 PMC: 11639098. DOI: 10.1080/03036758.2024.2398567.


References
1.
Sherman E, Gurm H, Balis U, Owens S, Wiens J . Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale. AMIA Annu Symp Proc. 2018; 2017:1571-1580. PMC: 5977714. View

2.
Choi E, Bahadori M, Schuetz A, Stewart W, Sun J . Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. JMLR Workshop Conf Proc. 2017; 56:301-318. PMC: 5341604. View

3.
Dhudasia M, Mukhopadhyay S, Puopolo K . Implementation of the Sepsis Risk Calculator at an Academic Birth Hospital. Hosp Pediatr. 2018; 8(5):243-250. DOI: 10.1542/hpeds.2017-0180. View

4.
Wiens J, Guttag J, Horvitz E . A study in transfer learning: leveraging data from multiple hospitals to enhance hospital-specific predictions. J Am Med Inform Assoc. 2014; 21(4):699-706. PMC: 4078276. DOI: 10.1136/amiajnl-2013-002162. View

5.
Miotto R, Li L, Kidd B, Dudley J . Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci Rep. 2016; 6:26094. PMC: 4869115. DOI: 10.1038/srep26094. View