» Articles » PMID: 26250683

STATISTICS. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis

Overview
Journal Science
Specialty Science
Date 2015 Aug 8
PMID 26250683
Citations 33
Authors
Affiliations
Soon will be listed here.
Abstract

Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses.

Citing Articles

Preliminary Results: Comparison of Convolutional Neural Network Architectures as an Auxiliary Clinical Tool Applied to Screening Mammography in Mexican Women.

Acosta-Jimenez S, Gonzalez-Chavez S, Camarillo-Cisneros J, Pacheco-Tena C, Barcenas-Lopez M, Gonzalez-Lozada L J Med Biol Eng. 2025; 44(3):390-400.

PMID: 40027073 PMC: 11870662. DOI: 10.1007/s40846-024-00868-6.


Machine learning based on alcohol drinking-gut microbiota-liver axis in predicting the occurrence of early-stage hepatocellular carcinoma.

Yang Y, Bo Z, Wang J, Chen B, Su Q, Lian Y BMC Cancer. 2024; 24(1):1468.

PMID: 39609660 PMC: 11606210. DOI: 10.1186/s12885-024-13161-1.


Identification of Psychological Treatment Dropout Predictors Using Machine Learning Models on Italian Patients Living with Overweight and Obesity Ineligible for Bariatric Surgery.

Marchitelli S, Mazza C, Ricci E, Faia V, Biondi S, Colasanti M Nutrients. 2024; 16(16).

PMID: 39203742 PMC: 11357013. DOI: 10.3390/nu16162605.


Screening -hackers: Dissemination noise as bait.

Echenique F, He K Proc Natl Acad Sci U S A. 2024; 121(21):e2400787121.

PMID: 38758697 PMC: 11126912. DOI: 10.1073/pnas.2400787121.


Will we ever be able to accurately predict solubility?.

Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A Sci Data. 2024; 11(1):303.

PMID: 38499581 PMC: 10948805. DOI: 10.1038/s41597-024-03105-6.