A Common Longitudinal Intensive Care Unit Data Format (CLIF) to Enable Multi-institutional Federated Critical Illness Research

Overview

Journal medRxiv

Date 2024 Sep 16

PMID 39281737

Authors

Juan C Rojas

Patrick G Lyons

Kaveri Chhikara

Vaishvik Chaudhari

Sivasubramanium V Bhavani

Muna Nour

Kevin G Buell

Kevin D Smith

Catherine A Gao

Saki Amagai

Chengsheng Mao

Yuan Luo

Anna K Barker

Mark Nuppnau

Haley Beck

Rachel Baccile

Michael Hermsen

Zewei Liao

Brenna Park-Egan

Kyle A Carey

XuanHan

Chad H Hochberg

Nicholas E Ingraham

William F Parker

Affiliations

Soon will be listed here.

Abstract

Background: Critical illness, or acute organ failure requiring life support, threatens over five million American lives annually. Electronic health record (EHR) data are a source of granular information that could generate crucial insights into the nature and optimal treatment of critical illness. However, data management, security, and standardization are barriers to large-scale critical illness EHR studies.

Methods: A consortium of critical care physicians and data scientists from eight US healthcare systems developed the Common Longitudinal Intensive Care Unit (ICU) data Format (CLIF), an open-source database format that harmonizes a minimum set of ICU Data Elements for use in critical illness research. We created a pipeline to process adult ICU EHR data at each site. After development and iteration, we conducted two proof-of-concept studies with a federated research architecture: 1) an external validation of an in-hospital mortality prediction model for critically ill patients and 2) an assessment of 72-hour temperature trajectories and their association with mechanical ventilation and in-hospital mortality using group-based trajectory models.

Results: We converted longitudinal data from 94,356 critically ill patients treated in 2020-2021 (mean age 60.6 years [standard deviation 17.2], 30% Black, 7% Hispanic, 45% female) across 8 health systems and 33 hospitals into the CLIF format, The in-hospital mortality prediction model performed well in the health system where it was derived (0.81 AUC, 0.06 Brier score). Performance across CLIF consortium sites varied (AUCs: 0.74-0.83, Brier scores: 0.06-0.01), and demonstrated some degradation in predictive capability. Temperature trajectories were similar across health systems. Hypothermic and hyperthermic-slow-resolver patients consistently had the highest mortality.

Conclusions: CLIF facilitates efficient, rigorous, and reproducible critical care research. Our federated case studies showcase CLIF's potential for disease sub-phenotyping and clinical decision-support evaluation. Future applications include pragmatic EHR-based trials, target trial emulations, foundational multi-modal AI models of critical illness, and real-time critical care quality dashboards.

References

Benzoni N, Carey K, Bewley A, Klaus J, Fuller B, Edelson D . Temperature Trajectory Subphenotypes in Oncology Patients with Neutropenia and Suspected Infection. Am J Respir Crit Care Med. 2022; 207(10):1300-1309. PMC: 10595453. DOI: 10.1164/rccm.202205-0920OC. View

Miller W, Han X, Peek M, Ashana D, Parker W . Accuracy of the Sequential Organ Failure Assessment Score for In-Hospital Mortality by Race and Relevance to Crisis Standards of Care. JAMA Netw Open. 2021; 4(6):e2113891. PMC: 8214156. DOI: 10.1001/jamanetworkopen.2021.13891. View

Pollard T, Johnson A, Raffa J, Celi L, Mark R, Badawi O . The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018; 5:180178. PMC: 6132188. DOI: 10.1038/sdata.2018.178. View

Lyons P, Hofford M, Yu S, Michelson A, Payne P, Hough C . Factors Associated With Variability in the Performance of a Proprietary Sepsis Prediction Model Across 9 Networked Hospitals in the US. JAMA Intern Med. 2023; 183(6):611-612. PMC: 10071393. DOI: 10.1001/jamainternmed.2022.7182. View

Rojas J, Carey K, Edelson D, Venable L, Howell M, Churpek M . Predicting Intensive Care Unit Readmission with Machine Learning Using Electronic Health Record Data. Ann Am Thorac Soc. 2018; 15(7):846-853. PMC: 6207111. DOI: 10.1513/AnnalsATS.201710-787OC. View

Denney M, Long D, Armistead M, Anderson J, Conway B . Validating the extract, transform, load process used to populate a large clinical research database. Int J Med Inform. 2016; 94:271-4. PMC: 5556907. DOI: 10.1016/j.ijmedinf.2016.07.009. View

Callahan A, Shah N, Chen J . Research and Reporting Considerations for Observational Studies Using Electronic Health Record Data. Ann Intern Med. 2020; 172(11 Suppl):S79-S84. PMC: 7413106. DOI: 10.7326/M19-0873. View

Nemati S, Holder A, Razmi F, Stanley M, Clifford G, Buchman T . An Interpretable Machine Learning Model for Accurate Prediction of Sepsis in the ICU. Crit Care Med. 2017; 46(4):547-553. PMC: 5851825. DOI: 10.1097/CCM.0000000000002936. View

Johnson A, Pollard T, Shen L, Lehman L, Feng M, Ghassemi M . MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035. PMC: 4878278. DOI: 10.1038/sdata.2016.35. View

10.

Paris N, Lamer A, Parrot A . Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study. JMIR Med Inform. 2021; 9(12):e30970. PMC: 8715361. DOI: 10.2196/30970. View

11.

Bongers K, Chanderraj R, Woods R, McDonald R, Adame M, Falkowski N . The Gut Microbiome Modulates Body Temperature Both in Sepsis and Health. Am J Respir Crit Care Med. 2022; 207(8):1030-1041. PMC: 10112447. DOI: 10.1164/rccm.202201-0161OC. View

12.

Leese P, Anand A, Girvin A, Manna A, Patel S, Yoo Y . Clinical encounter heterogeneity and methods for resolving in networked EHR data: a study from N3C and RECOVER programs. J Am Med Inform Assoc. 2023; 30(6):1125-1136. PMC: 10198518. DOI: 10.1093/jamia/ocad057. View

13.

Weiskopf N, Dorr D, Jackson C, Lehmann H, Thompson C . Healthcare utilization is a collider: an introduction to collider bias in EHR data reuse. J Am Med Inform Assoc. 2023; 30(5):971-977. PMC: 10114115. DOI: 10.1093/jamia/ocad013. View

14.

Ong T, Pradhananga R, Holve E, Kahn M . A Framework for Classification of Electronic Health Data Extraction-Transformation-Loading Challenges in Data Network Participation. EGEMS (Wash DC). 2018; 5(1):10. PMC: 5994935. DOI: 10.5334/egems.222. View

15.

Thoral P, Peppink J, Driessen R, Sijbrands E, Kompanje E, Kaplan L . Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example. Crit Care Med. 2021; 49(6):e563-e577. PMC: 8132908. DOI: 10.1097/CCM.0000000000004916. View

16.

Hersh W, Weiner M, Embi P, Logan J, Payne P, Bernstam E . Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013; 51(8 Suppl 3):S30-7. PMC: 3748381. DOI: 10.1097/MLR.0b013e31829b1dbd. View

17.

Johnson A, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S . MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023; 10(1):1. PMC: 9810617. DOI: 10.1038/s41597-022-01899-x. View

18.

Collins G, Dhiman P, Andaur Navarro C, Ma J, Hooft L, Reitsma J . Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021; 11(7):e048008. PMC: 8273461. DOI: 10.1136/bmjopen-2020-048008. View

19.

Lambden S, Laterre P, Levy M, Francois B . The SOFA score-development, utility and challenges of accurate assessment in clinical trials. Crit Care. 2019; 23(1):374. PMC: 6880479. DOI: 10.1186/s13054-019-2663-7. View

20.

Vickers A, Elkin E . Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006; 26(6):565-74. PMC: 2577036. DOI: 10.1177/0272989X06295361. View