Article: Assumption Adequacy Averaging as a Concept to Develop More Robust Methods for Differential Gene Expression Analysis

The concept of assumption adequacy averaging is introduced as a technique to develop more robust methods that incorporate assessments of assumption adequacy into the analysis. The concept is illustrated by using it to develop a method that averages results from the t-test and nonparametric rank-sum test with weights obtained from using the Shapiro-Wilk test to test the assumption of normality. Through this averaging process, the proposed method is able to rely more heavily on the statistical test that the data suggests is superior for each individual gene. Subsequently, this method developed by assumption adequacy averaging outperforms its two component methods (the t-test and rank-sum test) in a series of traditional and bootstrap-based simulation studies. The proposed method showed greater concordance in gene selection across two studies of gene expression in acute myeloid leukemia than did the t-test or rank-sum test. An R routine to implement the method is available upon request.

Citing Articles

Statistical Issues and Group Classification in Plasma MicroRNA Studies With Data Application.

Rai S, Qian C, Pan J, McClain M, Eichenberger M, McClain C Evol Bioinform Online. 2020; 16:1176934320913338.

PMID: 32313420 PMC: 7157974. DOI: 10.1177/1176934320913338.

Identifying reproducible cancer-associated highly expressed genes with important functional significances using multiple datasets.

Huang H, Li X, Guo Y, Zhang Y, Deng X, Chen L Sci Rep. 2016; 6:36227.

PMID: 27796338 PMC: 5086981. DOI: 10.1038/srep36227.

Statistical Analysis of Repeated MicroRNA High-Throughput Data with Application to Human Heart Failure: A Review of Methodology.

Rai S, Ray H, Yuan X, Pan J, Hamid T, Prabhu S Open Access Med Stat. 2014; 2012(2):21-31.

PMID: 24738042 PMC: 3984897. DOI: 10.2147/OAMS.S27907.

Empirical evaluation of consistency and accuracy of methods to detect differentially expressed genes based on microarray data.

Yang D, Parrish R, Brock G Comput Biol Med. 2014; 46:1-10.

PMID: 24529200 PMC: 3993975. DOI: 10.1016/j.compbiomed.2013.12.002.

The most informative spacing test effectively discovers biologically relevant outliers or multiple modes in expression.

Pawlikowska I, Wu G, Edmonson M, Liu Z, Gruber T, Zhang J Bioinformatics. 2014; 30(10):1400-8.

PMID: 24458951 PMC: 4068004. DOI: 10.1093/bioinformatics/btu039.

References

1.

Pounds S, Cheng C . Sample size determination for the false discovery rate. Bioinformatics. 2005; 21(23):4263-71. DOI: 10.1093/bioinformatics/bti699. View

2.

Pounds S, Morris S . Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics. 2003; 19(10):1236-42. DOI: 10.1093/bioinformatics/btg148. View

3.

Allison D, Cui X, Page G, Sabripour M . Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2005; 7(1):55-65. DOI: 10.1038/nrg1749. View

4.

Cui X, Hwang J, Qiu J, Blades N, Churchill G . Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics. 2004; 6(1):59-75. DOI: 10.1093/biostatistics/kxh018. View

5.

Pounds S . Estimation and control of multiple testing error rates for microarray studies. Brief Bioinform. 2006; 7(1):25-36. DOI: 10.1093/bib/bbk002. View

6.

Jung S, Owzar K, George S . A multiple testing procedure to associate gene expression levels with survival. Stat Med. 2005; 24(20):3077-88. DOI: 10.1002/sim.2179. View

7.

Cheng C, Pounds S, Boyett J, Pei D, Kuo M, Roussel M . Statistical significance threshold criteria for analysis of microarray gene expression data. Stat Appl Genet Mol Biol. 2006; 3:Article36. DOI: 10.2202/1544-6115.1064. View

8.

Liao J, Lin Y, Selvanayagam Z, Shih W . A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics. 2004; 20(16):2694-701. DOI: 10.1093/bioinformatics/bth310. View

9.

Ross M, Mahfouz R, Onciu M, Liu H, Zhou X, Song G . Gene expression profiling of pediatric acute myelogenous leukemia. Blood. 2004; 104(12):3679-87. DOI: 10.1182/blood-2004-03-1154. View

10.

Bullinger L, Dohner K, Bair E, Frohling S, Schlenk R, Tibshirani R . Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. N Engl J Med. 2004; 350(16):1605-16. DOI: 10.1056/NEJMoa031046. View

11.

Pounds S, Cheng C . Improving false discovery rate estimation. Bioinformatics. 2004; 20(11):1737-45. DOI: 10.1093/bioinformatics/bth160. View

12.

Cheng C, Pounds S . False discovery rate paradigms for statistical analyses of microarray gene expression data. Bioinformation. 2007; 1(10):436-46. PMC: 1896060. DOI: 10.6026/97320630001436. View

Assumption Adequacy Averaging As a Concept to Develop More Robust Methods for Differential Gene Expression Analysis