» Articles » PMID: 16646820

Asymptotic Optimality of Likelihood-based Cross-validation

Overview
Date 2006 May 2
PMID 16646820
Citations 25
Authors
Affiliations
Soon will be listed here.
Abstract

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish a finite sample result for a general class of likelihood-based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation). This result implies that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true density) as a benchmark model selector which is optimal for each given dataset and depends on the true density. Crucial conditions of our theorem are that the size of the validation sample converges to infinity, which excludes leave-one-out cross-validation, and that the candidate density estimates are bounded away from zero and infinity. We illustrate these asymptotic results and the practical performance of likelihood-based cross-validation for the purpose of bandwidth selection with a simulation study. Moreover, we use likelihood-based cross-validation in the context of regulatory motif detection in DNA sequences.

Citing Articles

A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology.

Hejazi N, Boileau P, van der Laan M, Hubbard A Stat Methods Med Res. 2022; 32(3):539-554.

PMID: 36573044 PMC: 11078029. DOI: 10.1177/09622802221146313.


The Association of Teamlets and Teams with Physician Burnout and Patient Outcomes.

Casalino L, Jung H, Bodenheimer T, Diaz I, Chen M, Willard-Grace R J Gen Intern Med. 2022; 38(6):1384-1392.

PMID: 36441365 PMC: 10160282. DOI: 10.1007/s11606-022-07894-7.


Assessing trends in vaccine efficacy by pathogen genetic distance.

Benkeser D, Juraska M, B Gilbert P J Soc Fr Statistique (2009). 2020; 161(1):164-175.

PMID: 33244440 PMC: 7685316.


Estimating and Testing Vaccine Sieve Effects Using Machine Learning.

Benkeser D, B Gilbert P, Carone M J Am Stat Assoc. 2019; 114(527):1038-1049.

PMID: 31649413 PMC: 6812562. DOI: 10.1080/01621459.2018.1529594.


Generalized Score Functions for Causal Discovery.

Huang B, Zhang K, Lin Y, Scholkopf B, Glymour C KDD. 2018; 2018:1551-1560.

PMID: 30191079 PMC: 6123020. DOI: 10.1145/3219819.3220104.