» Articles » PMID: 24551421

Sick Patients Have More Data: the Non-random Completeness of Electronic Health Records

Overview
Date 2014 Feb 20
PMID 24551421
Citations 68
Authors
Affiliations
Soon will be listed here.
Abstract

As interest in the reuse of electronic health record (EHR) data for research purposes grows, so too does awareness of the significant data quality problems in these non-traditional datasets. In the past, however, little attention has been paid to whether poor data quality merely introduces noise into EHR-derived datasets, or if there is potential for the creation of spurious signals and bias. In this study we use EHR data to demonstrate a statistically significant relationship between EHR completeness and patient health status, indicating that records with more data are likely to be more representative of sick patients than healthy ones, and therefore may not reflect the broader population found within the EHR.

Citing Articles

Eight quick tips for biologically and medically informed machine learning.

Oneto L, Chicco D PLoS Comput Biol. 2025; 21(1):e1012711.

PMID: 39787089 PMC: 11717244. DOI: 10.1371/journal.pcbi.1012711.


With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research.

Olaker V, Fry S, Terebuh P, Davis P, Tisch D, Xu R Clin Transl Sci. 2024; 18(1):e70093.

PMID: 39740190 PMC: 11685181. DOI: 10.1111/cts.70093.


Reducing Information and Selection Bias in EHR-Linked Biobanks via Genetics-Informed Multiple Imputation and Sample Weighting.

Salvatore M, Kundu R, Du J, Friese C, Mondul A, Hanauer D medRxiv. 2024; .

PMID: 39574876 PMC: 11581092. DOI: 10.1101/2024.10.28.24316286.


Feasibility of structuring electronic health record data to facilitate real-world data research: ICAREdata methods applied to multicenter cancer clinical trials.

George S, Campbell N, Hillman S, Harlos E, Stein D, Chan M Cancer. 2024; 131(1):e35528.

PMID: 39192753 PMC: 11693928. DOI: 10.1002/cncr.35528.


The effect of comorbidities on diagnostic interval for lung cancer in England: a cohort study using electronic health record data.

Rogers I, Cooper M, Memon A, Forbes L, van Marwijk H, Ford E Br J Cancer. 2024; 131(7):1147-1157.

PMID: 39179794 PMC: 11442666. DOI: 10.1038/s41416-024-02824-2.


References
1.
Schafer J, Graham J . Missing data: our view of the state of the art. Psychol Methods. 2002; 7(2):147-77. View

2.
Safran C, Bloomrosen M, Hammond W, Labkoff S, Markel-Fox S, Tang P . Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc. 2006; 14(1):1-9. PMC: 2329823. DOI: 10.1197/jamia.M2273. View

3.
Hersh W, Weiner M, Embi P, Logan J, Payne P, Bernstam E . Caveats for the use of operational electronic health record data in comparative effectiveness research. Med Care. 2013; 51(8 Suppl 3):S30-7. PMC: 3748381. DOI: 10.1097/MLR.0b013e31829b1dbd. View

4.
DRIPPS R, Lamont A, ECKENHOFF J . The role of anesthesia in surgical mortality. JAMA. 1961; 178:261-6. DOI: 10.1001/jama.1961.03040420001001. View

5.
Hersh W . Adding value to the electronic health record through secondary use of data for quality assurance, research, and surveillance. Am J Manag Care. 2007; 13(6 Part 1):277-8. View