» Articles » PMID: 35706464

Empirical Evaluation of Sub-cohort Sampling Designs for Risk Prediction Modeling

Overview
Journal J Appl Stat
Specialty Public Health
Date 2022 Jun 16
PMID 35706464
Authors
Affiliations
Soon will be listed here.
Abstract

Sub-cohort sampling designs, such as nested case-control (NCC) and case-cohort (CC) studies, have been widely used to estimate biomarker-disease associations because of their cost effectiveness. These designs have been well studied and shown to maintain relatively high efficiency compared to full-cohort designs, but their performance of building risk prediction models has been less studied. Moreover, sub-cohort sampling designs often use matching (or stratifying) to further control for confounders or to reduce measurement error. Their predictive performance depends on both the design and matching procedures. Based on a dataset from the NYU Women's Health Study (NYUWHS), we performed Monte Carlo simulations to systematically evaluate risk prediction performance under NCC, CC, and full-cohort studies. Our simulations demonstrate that sub-cohort sampling designs can have predictive accuracy (i.e. discrimination and calibration) similar to that of the full-cohort design, but could be sensitive to the matching procedure used. Our results suggest that researchers can have the option of performing NCC and CC studies with huge potential benefits in cost and resources, but need to pay particular attention to the matching procedure when developing a risk prediction model in biomarker studies.

Citing Articles

Weighted metrics are required when evaluating the performance of prediction models in nested case-control studies.

Rentroia-Pacheco B, Bellomo D, Lakeman I, Wakkee M, Hollestein L, van Klaveren D BMC Med Res Methodol. 2024; 24(1):115.

PMID: 38760688 PMC: 11533296. DOI: 10.1186/s12874-024-02213-6.


Goodness-of-fit two-phase sampling designs for time-to-event outcomes: a simulation study based on New York University Women's Health Study for breast cancer.

Lee M, Chen J, Zeleniuch-Jacquotte A, Liu M BMC Med Res Methodol. 2023; 23(1):119.

PMID: 37208600 PMC: 10199513. DOI: 10.1186/s12874-023-01950-4.


Editorial to special issue Frontiers of Data Analysis.

Jin Z, Sun J J Appl Stat. 2022; 48(8):1349-1351.

PMID: 35706465 PMC: 9042080. DOI: 10.1080/02664763.2021.1922853.

References
1.
Pencina M, DAgostino Sr R, Larson M, Massaro J, Vasan R . Predicting the 30-year risk of cardiovascular disease: the framingham heart study. Circulation. 2009; 119(24):3078-84. PMC: 2748236. DOI: 10.1161/CIRCULATIONAHA.108.816694. View

2.
Borgan O, Langholz B, Samuelsen S, Goldstein L, Pogoda J . Exposure stratified case-cohort designs. Lifetime Data Anal. 2000; 6(1):39-58. DOI: 10.1023/a:1009661900674. View

3.
Lu W, Liu M . On estimation of linear transformation models with nested case-control sampling. Lifetime Data Anal. 2011; 18(1):80-93. PMC: 3259210. DOI: 10.1007/s10985-011-9203-3. View

4.
Ge W, Clendenen T, Afanasyeva Y, Koenig K, Agnoli C, Brinton L . Circulating anti-Müllerian hormone and breast cancer risk: A study in ten prospective cohorts. Int J Cancer. 2018; 142(11):2215-2226. PMC: 5922424. DOI: 10.1002/ijc.31249. View

5.
McGeechan K, Macaskill P, Irwig L, Liew G, Wong T . Assessing new biomarkers and predictive models for use in clinical practice: a clinician's guide. Arch Intern Med. 2008; 168(21):2304-10. DOI: 10.1001/archinte.168.21.2304. View