» Articles » PMID: 24370496

PARAMO: a PARAllel Predictive MOdeling Platform for Healthcare Analytic Research Using Electronic Health Records

Overview
Journal J Biomed Inform
Publisher Elsevier
Date 2013 Dec 28
PMID 24370496
Citations 25
Authors
Affiliations
Soon will be listed here.
Abstract

Objective: Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data.

Methods: To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported.

Results: We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially.

Conclusion: This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers.

Citing Articles

eHealth implementation : a scoping review on legal, ethical, financial, and technological aspects.

Bente B, Van Dongen A, Verdaasdonk R, van Gemert-Pijnen L Front Digit Health. 2024; 6:1332707.

PMID: 38524249 PMC: 10957613. DOI: 10.3389/fdgth.2024.1332707.


Big Data Analytics to Reduce Preventable Hospitalizations-Using Real-World Data to Predict Ambulatory Care-Sensitive Conditions.

Schulte T, Wurz T, Groene O, Bohnet-Joschko S Int J Environ Res Public Health. 2023; 20(6).

PMID: 36981600 PMC: 10049041. DOI: 10.3390/ijerph20064693.


Disease-specific data processing: An intelligent digital platform for diabetes based on model prediction and data analysis utilizing big data technology.

Kong X, Peng R, Dai H, Li Y, Lu Y, Sun X Front Public Health. 2022; 10:1053269.

PMID: 36579056 PMC: 9791221. DOI: 10.3389/fpubh.2022.1053269.


How can Big Data Analytics Support People-Centred and Integrated Health Services: A Scoping Review.

Schulte T, Bohnet-Joschko S Int J Integr Care. 2022; 22(2):23.

PMID: 35756337 PMC: 9205381. DOI: 10.5334/ijic.5543.


System Architecture of a European Platform for Health Policy Decision Making: MIDAS.

Shi X, Nikolic G, Fischaber S, Black M, Rankin D, Epelde G Front Public Health. 2022; 10:838438.

PMID: 35433572 PMC: 9008448. DOI: 10.3389/fpubh.2022.838438.


References
1.
Weiskopf N, Weng C . Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2012; 20(1):144-51. PMC: 3555312. DOI: 10.1136/amiajnl-2011-000681. View

2.
Newton K, Peissig P, Kho A, Bielinski S, Berg R, Choudhary V . Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. J Am Med Inform Assoc. 2013; 20(e1):e147-54. PMC: 3715338. DOI: 10.1136/amiajnl-2012-000896. View

3.
Bellazzi R, Zupan B . Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform. 2006; 77(2):81-97. DOI: 10.1016/j.ijmedinf.2006.11.006. View

4.
Despres J, Lemieux I . Abdominal obesity and metabolic syndrome. Nature. 2006; 444(7121):881-7. DOI: 10.1038/nature05488. View

5.
Agarwal S, Chambless L, Ballantyne C, Astor B, Bertoni A, Chang P . Prediction of incident heart failure in general practice: the Atherosclerosis Risk in Communities (ARIC) Study. Circ Heart Fail. 2012; 5(4):422-9. PMC: 3412686. DOI: 10.1161/CIRCHEARTFAILURE.111.964841. View