» Articles » PMID: 33040151

Democratizing EHR Analyses with FIDDLE: a Flexible Data-driven Preprocessing Pipeline for Structured Clinical Data

Overview
Date 2020 Oct 11
PMID 33040151
Citations 29
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: In applying machine learning (ML) to electronic health record (EHR) data, many decisions must be made before any ML is applied; such preprocessing requires substantial effort and can be labor-intensive. As the role of ML in health care grows, there is an increasing need for systematic and reproducible preprocessing techniques for EHR data. Thus, we developed FIDDLE (Flexible Data-Driven Pipeline), an open-source framework that streamlines the preprocessing of data extracted from the EHR.

Materials And Methods: Largely data-driven, FIDDLE systematically transforms structured EHR data into feature vectors, limiting the number of decisions a user must make while incorporating good practices from the literature. To demonstrate its utility and flexibility, we conducted a proof-of-concept experiment in which we applied FIDDLE to 2 publicly available EHR data sets collected from intensive care units: MIMIC-III and the eICU Collaborative Research Database. We trained different ML models to predict 3 clinically important outcomes: in-hospital mortality, acute respiratory failure, and shock. We evaluated models using the area under the receiver operating characteristics curve (AUROC), and compared it to several baselines.

Results: Across tasks, FIDDLE extracted 2,528 to 7,403 features from MIMIC-III and eICU, respectively. On all tasks, FIDDLE-based models achieved good discriminative performance, with AUROCs of 0.757-0.886, comparable to the performance of MIMIC-Extract, a preprocessing pipeline designed specifically for MIMIC-III. Furthermore, our results showed that FIDDLE is generalizable across different prediction times, ML algorithms, and data sets, while being relatively robust to different settings of user-defined arguments.

Conclusions: FIDDLE, an open-source preprocessing pipeline, facilitates applying ML to structured EHR data. By accelerating and standardizing labor-intensive preprocessing, FIDDLE can help stimulate progress in building clinically useful ML tools for EHR data.

Citing Articles

Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER).

Liao W, Voldman J PLOS Digit Health. 2024; 3(10):e0000640.

PMID: 39432484 PMC: 11493250. DOI: 10.1371/journal.pdig.0000640.


Automated Fusion of Multimodal Electronic Health Records for Better Medical Predictions.

Cui S, Wang J, Zhong Y, Liu H, Wang T, Ma F Proc SIAM Int Conf Data Min. 2024; 2024:361-369.

PMID: 39399238 PMC: 11469647. DOI: 10.1137/1.9781611978032.41.


An open-source framework for end-to-end analysis of electronic health record data.

Heumos L, Ehmele P, Treis T, Upmeier Zu Belzen J, Roellin E, May L Nat Med. 2024; 30(11):3369-3380.

PMID: 39266748 PMC: 11564094. DOI: 10.1038/s41591-024-03214-0.


A scalable and transparent data pipeline for AI-enabled health data ecosystems.

Namli T, Sinaci A, Gonul S, Herguido C, Garcia-Canadilla P, Munoz A Front Med (Lausanne). 2024; 11:1393123.

PMID: 39139784 PMC: 11321077. DOI: 10.3389/fmed.2024.1393123.


Affordable and real-time antimicrobial resistance prediction from multimodal electronic health records.

Hardan S, Shaaban M, Abdalla J, Yaqub M Sci Rep. 2024; 14(1):16464.

PMID: 39013934 PMC: 11252127. DOI: 10.1038/s41598-024-66812-5.


References
1.
Zeiberg D, Prahlad T, Nallamothu B, Iwashyna T, Wiens J, Sjoding M . Machine learning for patient risk stratification for acute respiratory distress syndrome. PLoS One. 2019; 14(3):e0214465. PMC: 6438573. DOI: 10.1371/journal.pone.0214465. View

2.
Koyner J, Carey K, Edelson D, Churpek M . The Development of a Machine Learning Inpatient Acute Kidney Injury Prediction Model. Crit Care Med. 2018; 46(7):1070-1077. DOI: 10.1097/CCM.0000000000003123. View

3.
Sherman E, Gurm H, Balis U, Owens S, Wiens J . Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale. AMIA Annu Symp Proc. 2018; 2017:1571-1580. PMC: 5977714. View

4.
Li B, Oh J, Young V, Rao K, Wiens J . Using Machine Learning and the Electronic Health Record to Predict Complicated Infection. Open Forum Infect Dis. 2019; 6(5):ofz186. PMC: 6527086. DOI: 10.1093/ofid/ofz186. View

5.
LaFleur B, Greevy R . Introduction to permutation and resampling-based hypothesis tests. J Clin Child Adolesc Psychol. 2009; 38(2):286-94. DOI: 10.1080/15374410902740411. View