» Articles » PMID: 12590413

Bias Due to Missing Exposure Data Using Complete-case Analysis in the Proportional Hazards Regression Model

Overview
Journal Stat Med
Publisher Wiley
Specialty Public Health
Date 2003 Feb 19
PMID 12590413
Citations 28
Authors
Affiliations
Soon will be listed here.
Abstract

We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.

Citing Articles

Data-driven risk analysis of nonlinear factor interactions in road safety using Bayesian networks.

Carrodano C Sci Rep. 2024; 14(1):18948.

PMID: 39147840 PMC: 11327359. DOI: 10.1038/s41598-024-69740-6.


Machine Learning Techniques for Developing Remotely Monitored Central Nervous System Biomarkers Using Wearable Sensors: A Narrative Literature Review.

Zhuparris A, de Goede A, Yocarini I, Kraaij W, Groeneveld G, Doll R Sensors (Basel). 2023; 23(11).

PMID: 37299969 PMC: 10256016. DOI: 10.3390/s23115243.


Methods for handling missing data in serially sampled sputum specimens for mycobacterial culture conversion calculation.

Malatesta S, Weir I, Weber S, Bouton T, Carney T, Theron D BMC Med Res Methodol. 2022; 22(1):297.

PMID: 36402979 PMC: 9675206. DOI: 10.1186/s12874-022-01782-8.


Econometric Issues in Prospective Economic Evaluations Alongside Clinical Trials: Combining the Nonparametric Bootstrap With Methods That Address Missing Data.

Jalali A, Tamimi R, McPherson S, Murphy S Epidemiol Rev. 2022; 44(1):67-77.

PMID: 36104860 PMC: 10362933. DOI: 10.1093/epirev/mxac006.


Hybrid modelling for stroke care: Review and suggestions of new approaches for risk assessment and simulation of scenarios.

Herrgardh T, Madai V, Kelleher J, Magnusson R, Gustafsson M, Milani L Neuroimage Clin. 2021; 31:102694.

PMID: 34000646 PMC: 8141769. DOI: 10.1016/j.nicl.2021.102694.