A Two-step Method for Variable Selection in the Analysis of a Case-cohort Study
Overview
Affiliations
Background: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies.
Methods: We describe and explore the application of three variable selection methods to data from a case-cohort study. These are: (i) selecting variables based on their level of significance in univariable (i.e. one-at-a-time) Prentice-weighted Cox regression models; (ii) stepwise selection applied to Prentice-weighted Cox regression; and (iii) a two-step method which applies a Bayesian variable selection algorithm to obtain posterior probabilities of selection for each variable using multivariable logistic regression followed by effect estimation using Prentice-weighted Cox regression.
Results: Across nine different simulation scenarios, the two-step method demonstrated higher sensitivity and lower false discovery rate than the one-at-a-time and stepwise methods. In an application of the methods to data from the EPIC-InterAct case-cohort study, the two-step method identified an additional two fatty acids as being associated with incident type 2 diabetes, compared with the one-at-a-time and stepwise methods.
Conclusions: The two-step method enables more powerful and accurate detection of exposure-outcome associations in case-cohort studies. An R package is available to enable researchers to apply this method.
Low S, Pek S, Moh A, Liu J, Pandian B, Ang K Diab Vasc Dis Res. 2024; 21(6):14791641241304435.
PMID: 39626773 PMC: 11615981. DOI: 10.1177/14791641241304435.
Huang L, Liu Z, Zhang H, Li D, Li Z, Huang J High Blood Press Cardiovasc Prev. 2024; 32(1):87-98.
PMID: 39602007 DOI: 10.1007/s40292-024-00683-9.
Overbeek M, Rutters F, Nieuwdorp M, Davids M, van Valkengoed I, Galenkamp H BMJ Open Diabetes Res Care. 2024; 12(4).
PMID: 39025794 PMC: 11261679. DOI: 10.1136/bmjdrc-2024-004180.
Hu W, Chen S, Cai J, Yang Y, Yan H, Chen F BMC Med Res Methodol. 2024; 24(1):125.
PMID: 38831262 PMC: 11145821. DOI: 10.1186/s12874-024-02254-x.
Smith R, Harty P, Stratton M, Rafi Z, Rodriguez C, Dellinger J J Funct Morphol Kinesiol. 2021; 6(2).
PMID: 33919267 PMC: 8167794. DOI: 10.3390/jfmk6020036.