Internal Validation Inferences of Significant Genomic Features in Genome-wide Screening
Overview
Authors
Affiliations
Although validation of classification and prediction models has been a long-standing topic in Statistics and computer learning, the concept of statistical validation in genome-wide screening studies has been vague. Internal validation generally refers to validation procedures solely based on the study dataset. A popular approach to internal validation of identified genomic features has been the split-dataset validation. Contrast to this approach, internal validation in genome-wide association screening studies is precisely defined through the concepts of association profile and profile significance. A general procedure and two specific profile significance measures are developed and are compared with the split-dataset validation approach by a simulation study. The simulation results clearly demonstrate the strength and limitations of the profile significance approach to internal validation, especially its enormous gain in sensitivity (power) and stability over the split-dataset validation. The proposed methodology is illustrated by an example of genome-wide SNP associaiton analysis in genetic epidemiology.
Guolian Kang , Liu W, Cheng C, Wilson C, Neale G, Yang J J Hum Genet. 2015; 60(12):729-38.
PMID: 26377241 PMC: 4859941. DOI: 10.1038/jhg.2015.110.
A statistical approach to selecting and confirming validation targets in -omics experiments.
Leek J, Taub M, Rasgon J BMC Bioinformatics. 2012; 13:150.
PMID: 22738145 PMC: 3568710. DOI: 10.1186/1471-2105-13-150.
A Phenotype-Driven Dimension Reduction (PhDDR) approach to integrated genomic association analyses.
Gao C, Cheng C Annu Int Conf IEEE Eng Med Biol Soc. 2012; 2011:6837-40.
PMID: 22255909 PMC: 3652376. DOI: 10.1109/IEMBS.2011.6091686.