A Bayesian Approach to Efficient Differential Allocation for Resampling-based Significance Testing
Overview
Authors
Affiliations
Background: Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.
Results: We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.
Conclusion: Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/.
Li D, Le Pape M, Parikh N, Chen W, Dye T PLoS One. 2013; 8(11):e80099.
PMID: 24312198 PMC: 3842292. DOI: 10.1371/journal.pone.0080099.
Analysis of Correlated Gene Expression Data on Ordered Categories.
Peddada S, Harris S, Davidov O J Indian Soc Agric Stat. 2011; 64(1):45-60.
PMID: 21998487 PMC: 3190572.
Li M, Sham P, Wang J Bioinformatics. 2010; 26(22):2897-9.
PMID: 20861029 PMC: 2971576. DOI: 10.1093/bioinformatics/btq540.