» Articles » PMID: 19558706

A Bayesian Approach to Efficient Differential Allocation for Resampling-based Significance Testing

Overview
Publisher Biomed Central
Specialty Biology
Date 2009 Jun 30
PMID 19558706
Citations 3
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.

Results: We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.

Conclusion: Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/.

Citing Articles

Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

Li D, Le Pape M, Parikh N, Chen W, Dye T PLoS One. 2013; 8(11):e80099.

PMID: 24312198 PMC: 3842292. DOI: 10.1371/journal.pone.0080099.


Analysis of Correlated Gene Expression Data on Ordered Categories.

Peddada S, Harris S, Davidov O J Indian Soc Agric Stat. 2011; 64(1):45-60.

PMID: 21998487 PMC: 3190572.


FastPval: a fast and memory efficient program to calculate very low P-values from empirical distribution.

Li M, Sham P, Wang J Bioinformatics. 2010; 26(22):2897-9.

PMID: 20861029 PMC: 2971576. DOI: 10.1093/bioinformatics/btq540.

References
1.
Yang H, Churchill G . Estimating p-values in small microarray experiments. Bioinformatics. 2006; 23(1):38-43. DOI: 10.1093/bioinformatics/btl548. View

2.
Laird N, Lange C . Family-based designs in the age of large-scale gene-association studies. Nat Rev Genet. 2006; 7(5):385-94. DOI: 10.1038/nrg1839. View

3.
Jain N, Cho H, OConnell M, Lee J . Rank-invariant resampling based estimation of false discovery rate for analysis of small sample microarray data. BMC Bioinformatics. 2005; 6:187. PMC: 1187876. DOI: 10.1186/1471-2105-6-187. View

4.
Peddada S, Lobenhofer E, Li L, Afshari C, Weinberg C, Umbach D . Gene selection and clustering for time-course and dose-response microarray experiments using order-restricted inference. Bioinformatics. 2003; 19(7):834-41. DOI: 10.1093/bioinformatics/btg093. View

5.
Fung H, Scholz S, Matarin M, Simon-Sanchez J, Hernandez D, Britton A . Genome-wide genotyping in Parkinson's disease and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol. 2006; 5(11):911-6. DOI: 10.1016/S1474-4422(06)70578-6. View