» Articles » PMID: 36752649

Healthcare Utilization is a Collider: an Introduction to Collider Bias in EHR Data Reuse

Overview
Date 2023 Feb 8
PMID 36752649
Authors
Affiliations
Soon will be listed here.
Abstract

Objectives: Collider bias is a common threat to internal validity in clinical research but is rarely mentioned in informatics education or literature. Conditioning on a collider, which is a variable that is the shared causal descendant of an exposure and outcome, may result in spurious associations between the exposure and outcome. Our objective is to introduce readers to collider bias and its corollaries in the retrospective analysis of electronic health record (EHR) data.

Target Audience: Collider bias is likely to arise in the reuse of EHR data, due to data-generating mechanisms and the nature of healthcare access and utilization in the United States. Therefore, this tutorial is aimed at informaticians and other EHR data consumers without a background in epidemiological methods or causal inference.

Scope: We focus specifically on problems that may arise from conditioning on forms of healthcare utilization, a common collider that is an implicit selection criterion when one reuses EHR data. Directed acyclic graphs (DAGs) are introduced as a tool for identifying potential sources of bias during study design and planning. References for additional resources on causal inference and DAG construction are provided.

Citing Articles

Step-by-step causal analysis of EHRs to ground decision-making.

Doutreligne M, Struja T, Abecassis J, Morgand C, Celi L, Varoquaux G PLOS Digit Health. 2025; 4(2):e0000721.

PMID: 39899627 PMC: 11790099. DOI: 10.1371/journal.pdig.0000721.


With big data comes big responsibility: Strategies for utilizing aggregated, standardized, de-identified electronic health record data for research.

Olaker V, Fry S, Terebuh P, Davis P, Tisch D, Xu R Clin Transl Sci. 2024; 18(1):e70093.

PMID: 39740190 PMC: 11685181. DOI: 10.1111/cts.70093.


A Common Longitudinal Intensive Care Unit data Format (CLIF) to enable multi-institutional federated critical illness research.

Rojas J, Lyons P, Chhikara K, Chaudhari V, Bhavani S, Nour M medRxiv. 2024; .

PMID: 39281737 PMC: 11398431. DOI: 10.1101/2024.09.04.24313058.


Understanding enterprise data warehouses to support clinical and translational research: impact, sustainability, demand management, and accessibility.

Campion Jr T, Craven C, Dorr D, Bernstam E, Knosp B J Am Med Inform Assoc. 2024; 31(7):1522-1528.

PMID: 38777803 PMC: 11187432. DOI: 10.1093/jamia/ocae111.


Differential Participation, a Potential Cause of Spurious Associations in Observational Cohorts in Environmental Epidemiology.

Chen C, Chen H, Kaufman J, Benmarhnia T Epidemiology. 2024; 35(2):174-184.

PMID: 38290140 PMC: 10826917. DOI: 10.1097/EDE.0000000000001711.


References
1.
Suttorp M, Siegerink B, Jager K, Zoccali C, Dekker F . Graphical presentation of confounding in directed acyclic graphs. Nephrol Dial Transplant. 2014; 30(9):1418-23. DOI: 10.1093/ndt/gfu325. View

2.
Douthit N, Kiv S, Dwolatzky T, Biswas S . Exposing some important barriers to health care access in the rural USA. Public Health. 2015; 129(6):611-20. DOI: 10.1016/j.puhe.2015.04.001. View

3.
Levine D, Linder J, Landon B . Characteristics of Americans With Primary Care and Changes Over Time, 2002-2015. JAMA Intern Med. 2019; 180(3):463-466. PMC: 6990950. DOI: 10.1001/jamainternmed.2019.6282. View

4.
Canaway R, Boyle D, Manski-Nankervis J, Gray K . Identifying primary care datasets and perspectives on their secondary use: a survey of Australian data users and custodians. BMC Med Inform Decis Mak. 2022; 22(1):94. PMC: 8988328. DOI: 10.1186/s12911-022-01830-9. View

5.
Shivade C, Raghavan P, Fosler-Lussier E, Embi P, Elhadad N, Johnson S . A review of approaches to identifying patient phenotype cohorts using electronic health records. J Am Med Inform Assoc. 2013; 21(2):221-30. PMC: 3932460. DOI: 10.1136/amiajnl-2013-001935. View