Validation of Prediction Models for Critical Care Outcomes Using Natural Language Processing of Electronic Health Record Data

Overview

Journal JAMA Netw Open

Publisher American Medical Association

Specialty General Medicine

Date 2019 Jan 16

PMID 30646310

Citations 40

Authors

Ben J Marafino

Miran Park

Jason M Davies

Robert Thombley

Harold S Luft

David C Sing

Dhruv S Kazi

Colette DeJong

W John Boscardin

Mitzi L Dean

R Adams Dudley

Affiliations

Soon will be listed here.

Abstract

Importance: Accurate prediction of outcomes among patients in intensive care units (ICUs) is important for clinical research and monitoring care quality. Most existing prediction models do not take full advantage of the electronic health record, using only the single worst value of laboratory tests and vital signs and largely ignoring information present in free-text notes. Whether capturing more of the available data and applying machine learning and natural language processing (NLP) can improve and automate the prediction of outcomes among patients in the ICU remains unknown.

Objectives: To evaluate the change in power for a mortality prediction model among patients in the ICU achieved by incorporating measures of clinical trajectory together with NLP of clinical text and to assess the generalizability of this approach.

Design, Setting, And Participants: This retrospective cohort study included 101 196 patients with a first-time admission to the ICU and a length of stay of at least 4 hours. Twenty ICUs at 2 academic medical centers (University of California, San Francisco [UCSF], and Beth Israel Deaconess Medical Center [BIDMC], Boston, Massachusetts) and 1 community hospital (Mills-Peninsula Medical Center [MPMC], Burlingame, California) contributed data from January 1, 2001, through June 1, 2017. Data were analyzed from July 1, 2017, through August 1, 2018.

Main Outcomes And Measures: In-hospital mortality and model discrimination as assessed by the area under the receiver operating characteristic curve (AUC) and model calibration as assessed by the modified Hosmer-Lemeshow statistic.

Results: Among 101 196 patients included in the analysis, 51.3% (n = 51 899) were male, with a mean (SD) age of 61.3 (17.1) years; their in-hospital mortality rate was 10.4% (n = 10 505). A baseline model using only the highest and lowest observed values for each laboratory test result or vital sign achieved a cross-validated AUC of 0.831 (95% CI, 0.830-0.832). In contrast, that model augmented with measures of clinical trajectory achieved an AUC of 0.899 (95% CI, 0.896-0.902; P < .001 for AUC difference). Further augmenting this model with NLP-derived terms associated with mortality further increased the AUC to 0.922 (95% CI, 0.916-0.924; P < .001). These NLP-derived terms were associated with improved model performance even when applied across sites (AUC difference for UCSF: 0.077 to 0.021; AUC difference for MPMC: 0.071 to 0.051; AUC difference for BIDMC: 0.035 to 0.043; P < .001) when augmenting with NLP at each site.

Conclusions And Relevance: Intensive care unit mortality prediction models incorporating measures of clinical trajectory and NLP-derived terms yielded excellent predictive performance and generalized well in this sample of hospitals. The role of these automated algorithms, particularly those using unstructured data from notes and other sources, in clinical research and quality improvement seems to merit additional investigation.

Citing Articles

A systematic review of natural language processing applications in Trauma & Orthopaedics.

Farrow L, Raja A, Zhong M, Anderson L Bone Jt Open. 2025; 6(3):264-274.

PMID: 40037398 PMC: 11879473. DOI: 10.1302/2633-1462.63.BJO-2024-0081.R1.

Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study.

Seinen T, Kors J, van Mulligen E, Rijnbeek P J Med Internet Res. 2025; 27:e66910.

PMID: 39946687 PMC: 11887999. DOI: 10.2196/66910.

Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study.

Cardamone N, Olfson M, Schmutte T, Ungar L, Liu T, Cullen S JMIR Med Inform. 2025; 13:e65454.

PMID: 39864953 PMC: 11884378. DOI: 10.2196/65454.

Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.

Trujeque J, Dudley R, Mesfin N, Ingraham N, Ortiz I, Bangerter A J Am Med Inform Assoc. 2024; 32(1):113-118.

PMID: 39530748 PMC: 11648724. DOI: 10.1093/jamia/ocae169.

The Growing Impact of Natural Language Processing in Healthcare and Public Health.

Jerfy A, Selden O, Balkrishnan R Inquiry. 2024; 61:469580241290095.

PMID: 39396164 PMC: 11475376. DOI: 10.1177/00469580241290095.

References

Delahanty R, Kaufman D, Jones S . Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients. Crit Care Med. 2018; 46(6):e481-e488. DOI: 10.1097/CCM.0000000000003011. View

Render M, Welsh D, Kollef M, Lott 3rd J, Hui S, Weinberger M . Automated computerized intensive care unit severity of illness measure in the Department of Veterans Affairs: preliminary results. SISVistA Investigators. Scrutiny of ICU Severity Veterans Health Sysyems Technology Architecture. Crit Care Med. 2000; 28(10):3540-6. DOI: 10.1097/00003246-200010000-00033. View

Detsky M, Harhay M, Bayard D, Delman A, Buehler A, Kent S . Discriminative Accuracy of Physician and Nurse Predictions for Survival and Functional Outcomes 6 Months After an ICU Admission. JAMA. 2017; 317(21):2187-2195. PMC: 5710341. DOI: 10.1001/jama.2017.4078. View

Khor R, Yip W, Bressel M, Rose W, Duchesne G, Foroudi F . Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements. J Am Med Inform Assoc. 2013; 21(1):27-30. PMC: 3912731. DOI: 10.1136/amiajnl-2013-002090. View

Varma S, Simon R . Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. 2006; 7:91. PMC: 1397873. DOI: 10.1186/1471-2105-7-91. View

Lehman L, Saeed M, Long W, Lee J, Mark R . Risk stratification of ICU patients using topic models inferred from unstructured progress notes. AMIA Annu Symp Proc. 2013; 2012:505-11. PMC: 3540429. View

Marafino B, Boscardin W, Dudley R . Efficient and sparse feature selection for biomedical text classification via the elastic net: Application to ICU risk stratification from nursing notes. J Biomed Inform. 2015; 54:114-20. DOI: 10.1016/j.jbi.2015.02.003. View

Zhu J, Hastie T . Classification of gene microarrays by penalized logistic regression. Biostatistics. 2004; 5(3):427-43. DOI: 10.1093/biostatistics/5.3.427. View

Breslow M, Badawi O . Severity scoring in the critically ill: part 2: maximizing value from outcome prediction scoring systems. Chest. 2012; 141(2):518-527. DOI: 10.1378/chest.11-0331. View

10.

Johnson A, Pollard T, Shen L, Lehman L, Feng M, Ghassemi M . MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035. PMC: 4878278. DOI: 10.1038/sdata.2016.35. View

11.

Badawi O, Liu X, Hassan E, Amelung P, Swami S . Evaluation of ICU Risk Models Adapted for Use as Continuous Markers of Severity of Illness Throughout the ICU Stay. Crit Care Med. 2018; 46(3):361-367. DOI: 10.1097/CCM.0000000000002904. View

12.

Breslow M, Badawi O . Severity scoring in the critically ill: part 1--interpretation and accuracy of outcome prediction scoring systems. Chest. 2012; 141(1):245-252. DOI: 10.1378/chest.11-0330. View

13.

Render M, Deddens J, Freyberg R, Almenoff P, Connors Jr A, Wagner D . Veterans Affairs intensive care unit risk adjustment model: validation, updating, recalibration. Crit Care Med. 2008; 36(4):1031-42. DOI: 10.1097/CCM.0b013e318169f290. View

14.

Botsis T, Nguyen M, Woo E, Markatou M, Ball R . Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection. J Am Med Inform Assoc. 2011; 18(5):631-8. PMC: 3168300. DOI: 10.1136/amiajnl-2010-000022. View

15.

Render M, Kim H, Welsh D, Timmons S, Johnston J, Hui S . Automated intensive care unit risk adjustment: results from a National Veterans Affairs study. Crit Care Med. 2003; 31(6):1638-46. DOI: 10.1097/01.CCM.0000055372.08235.09. View

16.

Xu H, Fu Z, Shah A, Chen Y, Peterson N, Chen Q . Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. AMIA Annu Symp Proc. 2011; 2011:1564-72. PMC: 3243156. View

17.

Hripcsak G, Austin J, Alderson P, Friedman C . Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002; 224(1):157-63. DOI: 10.1148/radiol.2241011118. View

18.

Sinuff T, Adhikari N, Cook D, Schunemann H, Griffith L, Rocker G . Mortality predictions in the intensive care unit: comparing physicians with scoring systems. Crit Care Med. 2006; 34(3):878-85. DOI: 10.1097/01.CCM.0000201881.58644.41. View

19.

Kuzniewicz M, Vasilevskis E, Lane R, Dean M, Trivedi N, Rennie D . Variation in ICU risk-adjusted mortality: impact of methods of assessment and potential confounders. Chest. 2008; 133(6):1319-1327. DOI: 10.1378/chest.07-3061. View

20.

Kramer A, Zimmerman J . Assessing the calibration of mortality benchmarks in critical care: The Hosmer-Lemeshow test revisited. Crit Care Med. 2007; 35(9):2052-6. DOI: 10.1097/01.CCM.0000275267.64078.B0. View