Unbiased Bootstrap Error Estimation for Linear Discriminant Analysis

Overview

Journal EURASIP J Bioinform Syst Biol

Publisher Springer

Specialty Biology

Date 2017 Feb 15

PMID 28194165

Citations 1

Authors

Thang Vu

Chao Sima

Ulisses M Braga-Neto

Edward R Dougherty

Affiliations

Soon will be listed here.

Abstract

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.

Citing Articles

The Molecular Mechanism of Human Voltage-Dependent Anion Channel 1 Blockade by the Metallofullerenol Gd@C(OH): An In Silico Study.

Wang X, Yang N, Su J, Wu C, Liu S, Chang L Biomolecules. 2022; 12(1).

PMID: 35053271 PMC: 8773804. DOI: 10.3390/biom12010123.

References

Taeho Hwang , Sun C, Yun T, Yi G . FiGS: a filter-based gene selection workbench for microarray data. BMC Bioinformatics. 2010; 11:50. PMC: 3098082. DOI: 10.1186/1471-2105-11-50. View

Jain A, Dubes R, Chen C . Bootstrap techniques for error estimation. IEEE Trans Pattern Anal Mach Intell. 2011; 9(5):628-33. DOI: 10.1109/tpami.1987.4767957. View

J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

Toussaint G, Sharpe P . An efficient method for estimating the probability of misclassification applied to a problem in medical diagnosis. Comput Biol Med. 1975; 4(3-4):269-78. DOI: 10.1016/0010-4825(75)90038-4. View

Braga-Neto U, Zollanvari A, Dougherty E . Cross-validation under separate sampling: strong bias and how to correct it. Bioinformatics. 2014; 30(23):3349-55. PMC: 4296143. DOI: 10.1093/bioinformatics/btu527. View

Vu T, Braga-Neto U . Is bagging effective in the classification of small-sample genomic and proteomic data?. EURASIP J Bioinform Syst Biol. 2009; :158368. PMC: 3171418. DOI: 10.1155/2009/158368. View

Braga-Neto U, Hashimoto R, Dougherty E, Nguyen D, Carroll R . Is cross-validation better than resubstitution for ranking genes?. Bioinformatics. 2004; 20(2):253-8. DOI: 10.1093/bioinformatics/btg399. View

Student S, Fujarewicz K . Stable feature selection and classification algorithms for multiclass microarray data. Biol Direct. 2012; 7:33. PMC: 3599581. DOI: 10.1186/1745-6150-7-33. View

Paul S, Maji P . μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix. BMC Bioinformatics. 2013; 14:266. PMC: 3844490. DOI: 10.1186/1471-2105-14-266. View

10.

Pils D, Tong D, Hager G, Obermayr E, Aust S, Heinze G . A combined blood based gene expression and plasma protein abundance signature for diagnosis of epithelial ovarian cancer--a study of the OVCAD consortium. BMC Cancer. 2013; 13:178. PMC: 3639192. DOI: 10.1186/1471-2407-13-178. View

11.

van de Vijver M, He Y, Vant Veer L, Dai H, Hart A, Voskuil D . A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002; 347(25):1999-2009. DOI: 10.1056/NEJMoa021967. View

12.

Braga-Neto U, Dougherty E . Is cross-validation valid for small-sample microarray classification?. Bioinformatics. 2004; 20(3):374-80. DOI: 10.1093/bioinformatics/btg419. View