» Articles » PMID: 33948244

Lessons and Tips for Designing a Machine Learning Study Using EHR Data

Overview
Date 2021 May 5
PMID 33948244
Citations 19
Authors
Affiliations
Soon will be listed here.
Abstract

Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.

Citing Articles

Bias in Prediction Models to Identify Patients With Colorectal Cancer at High Risk for Readmission After Resection.

Lucas M, Schootman M, Laryea J, Orcutt S, Li C, Ying J JCO Clin Cancer Inform. 2025; 8.

PMID: 39831110 PMC: 11741203. DOI: 10.1200/CCI.23.00194.


Development of a novel calculator to predict gonadotropin dose and oocyte yield in oocyte cryopreservation cycles.

Yilmaz B, Bakkensen J, Yeh C, Muhammad L, Feinberg E J Assist Reprod Genet. 2025; 42(2):423-432.

PMID: 39775731 PMC: 11871197. DOI: 10.1007/s10815-024-03372-7.


Clinical and socioeconomic predictors of hospital use and emergency department visits among children with medical complexity: A machine learning approach using administrative data.

Sidra M, Pietrosanu M, Zwicker J, Johnson D, Round J, Ohinmaa A PLoS One. 2024; 19(10):e0312195.

PMID: 39471234 PMC: 11521260. DOI: 10.1371/journal.pone.0312195.


Oncologic Applications of Artificial Intelligence and Deep Learning Methods in CT Spine Imaging-A Systematic Review.

Ong W, Lee A, Tan W, Fong K, Lai D, Tan Y Cancers (Basel). 2024; 16(17).

PMID: 39272846 PMC: 11394591. DOI: 10.3390/cancers16172988.


Machine Learning for prediction of violent behaviors in schizophrenia spectrum disorders: a systematic review.

Parsaei M, Arvin A, Taebi M, Seyedmirzaei H, Cattarinussi G, Sambataro F Front Psychiatry. 2024; 15:1384828.

PMID: 38577400 PMC: 10991827. DOI: 10.3389/fpsyt.2024.1384828.


References
1.
Rajkomar A, Hardt M, Howell M, Corrado G, Chin M . Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med. 2018; 169(12):866-872. PMC: 6594166. DOI: 10.7326/M18-1990. View

2.
Blumenthal D, Tavenner M . The "meaningful use" regulation for electronic health records. N Engl J Med. 2010; 363(6):501-4. DOI: 10.1056/NEJMp1006114. View

3.
McKeigue P . Sample size requirements for learning to classify with high-dimensional biomarker panels. Stat Methods Med Res. 2017; 28(3):904-910. DOI: 10.1177/0962280217738807. View

4.
Topol E . High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019; 25(1):44-56. DOI: 10.1038/s41591-018-0300-7. View

5.
Huang S, Chaudhary K, Garmire L . More Is Better: Recent Progress in Multi-Omics Data Integration Methods. Front Genet. 2017; 8:84. PMC: 5472696. DOI: 10.3389/fgene.2017.00084. View