» Articles » PMID: 27189013

Opportunities and Challenges in Developing Risk Prediction Models with Electronic Health Records Data: a Systematic Review

Overview
Date 2016 May 19
PMID 27189013
Citations 329
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. We sought to evaluate the current state of EHR based risk prediction modeling through a systematic review of clinical prediction studies using EHR data.

Methods: We searched PubMed for articles that reported on the use of an EHR to develop a risk prediction model from 2009 to 2014. Articles were extracted by two reviewers, and we abstracted information on study design, use of EHR data, model building, and performance from each publication and supplementary documentation.

Results: We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage the breadth of EHR data, as they uncommonly used longitudinal information (n = 37) and employed relatively few predictor variables (median = 27 variables). Less than half of the studies were multicenter (n = 50) and only 26 performed validation across sites. Many studies did not fully address biases of EHR data such as missing data or loss to follow-up. Average c-statistics for different outcomes were: mortality (0.84), clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71).

Conclusions: EHR data present both opportunities and challenges for clinical risk prediction. There is room for improvement in designing such studies.

Citing Articles

A decision-analytical perspective on incorporating multiple outcomes in the production of clinical prediction models: defining a taxonomy of risk estimands.

Martin G, Pate A, Bladon S, Sperrin M, Riley R BMC Med. 2025; 23(1):142.

PMID: 40050803 PMC: 11887178. DOI: 10.1186/s12916-025-03978-3.


Genetics, primary care records and lifestyle factors for short-term dynamic risk prediction of colorectal cancer: prospective study of asymptomatic and symptomatic UK Biobank participants.

Ip S, Harrison H, Usher-Smith J, Barclay M, Tyrer J, Dennis J BMJ Oncol. 2025; 4(1):e000336.

PMID: 40046831 PMC: 11880779. DOI: 10.1136/bmjonc-2024-000336.


Linking Electronic Health Record Prescribing Data and Pharmacy Dispensing Records to Identify Patient-Level Factors Associated With Psychotropic Medication Receipt: Retrospective Study.

Wu P, Hurst J, French A, Chrestensen M, Goldstein B JMIR Med Inform. 2025; 13:e63740.

PMID: 40035724 PMC: 11895725. DOI: 10.2196/63740.


Development and validation of a population-based risk algorithm for premature mortality in Canada: the Premature Mortality Population Risk Tool (PreMPoRT).

ONeill M, Hurst M, Pagalan L, Diemert L, Kornas K, Fisher S BMJ Public Health. 2025; 2(2):e000377.

PMID: 40018526 PMC: 11816297. DOI: 10.1136/bmjph-2023-000377.


Decentralized Clinical Trials in the Era of Real-World Evidence: A Statistical Perspective.

Chen J, Di J, Daizadeh N, Lu Y, Wang H, Shen Y Clin Transl Sci. 2025; 18(2):e70117.

PMID: 39972404 PMC: 11839390. DOI: 10.1111/cts.70117.


References
1.
OLeary E, Desale S, Yi W, Fujita K, Hynes C, Chandra S . Letting the sun set on small bowel obstruction: can a simple risk score tell us when nonoperative care is inappropriate?. Am Surg. 2014; 80(6):572-9. View

2.
Mani S, Ozdas A, Aliferis C, Varol H, Chen Q, Carnevale R . Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc. 2013; 21(2):326-36. PMC: 3932458. DOI: 10.1136/amiajnl-2013-001854. View

3.
Hippisley-Cox J, Coupland C . Identifying patients with suspected pancreatic cancer in primary care: derivation and validation of an algorithm. Br J Gen Pract. 2012; 62(594):e38-45. PMC: 3252538. DOI: 10.3399/bjgp12X616355. View

4.
Zhao D, Weng C . Combining PubMed knowledge and EHR data to develop a weighted bayesian network for pancreatic cancer prediction. J Biomed Inform. 2011; 44(5):859-68. PMC: 3174321. DOI: 10.1016/j.jbi.2011.05.004. View

5.
Gupta S, Tran T, Luo W, Phung D, Kennedy R, Broad A . Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open. 2014; 4(3):e004007. PMC: 3963101. DOI: 10.1136/bmjopen-2013-004007. View