» Articles » PMID: 28194165

Unbiased Bootstrap Error Estimation for Linear Discriminant Analysis

Overview
Publisher Springer
Specialty Biology
Date 2017 Feb 15
PMID 28194165
Citations 1
Authors
Affiliations
Soon will be listed here.
Abstract

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.

Citing Articles

The Molecular Mechanism of Human Voltage-Dependent Anion Channel 1 Blockade by the Metallofullerenol Gd@C(OH): An In Silico Study.

Wang X, Yang N, Su J, Wu C, Liu S, Chang L Biomolecules. 2022; 12(1).

PMID: 35053271 PMC: 8773804. DOI: 10.3390/biom12010123.

References
1.
Taeho Hwang , Sun C, Yun T, Yi G . FiGS: a filter-based gene selection workbench for microarray data. BMC Bioinformatics. 2010; 11:50. PMC: 3098082. DOI: 10.1186/1471-2105-11-50. View

2.
Jain A, Dubes R, Chen C . Bootstrap techniques for error estimation. IEEE Trans Pattern Anal Mach Intell. 2011; 9(5):628-33. DOI: 10.1109/tpami.1987.4767957. View

3.
J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

4.
Toussaint G, Sharpe P . An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput Biol Med. 1975; 4(3-4):269-78. DOI: 10.1016/0010-4825(75)90038-4. View

5.
Braga-Neto U, Zollanvari A, Dougherty E . Cross-validation under separate sampling: strong bias and how to correct it. Bioinformatics. 2014; 30(23):3349-55. PMC: 4296143. DOI: 10.1093/bioinformatics/btu527. View