Power and Sample Size Estimation in Microarray Studies

Overview

Journal BMC Bioinformatics

Publisher Biomed Central

Specialty Biology

Date 2010 Jan 27

PMID 20100337

Citations 25

Authors

Wei-Jiun Lin

Huey-Miin Hsueh

James J Chen

Affiliations

Soon will be listed here.

Abstract

Background: Before conducting a microarray experiment, one important issue that needs to be determined is the number of arrays required in order to have adequate power to identify differentially expressed genes. This paper discusses some crucial issues in the problem formulation, parameter specifications, and approaches that are commonly proposed for sample size estimation in microarray experiments. Common methods for sample size estimation are formulated as the minimum sample size necessary to achieve a specified sensitivity (proportion of detected truly differentially expressed genes) on average at a specified false discovery rate (FDR) level and specified expected proportion (pi1) of the true differentially expression genes in the array. Unfortunately, the probability of detecting the specified sensitivity in such a formulation can be low. We formulate the sample size problem as the number of arrays needed to achieve a specified sensitivity with 95% probability at the specified significance level. A permutation method using a small pilot dataset to estimate sample size is proposed. This method accounts for correlation and effect size heterogeneity among genes.

Results: A sample size estimate based on the common formulation, to achieve the desired sensitivity on average, can be calculated using a univariate method without taking the correlation among genes into consideration. This formulation of sample size problem is inadequate because the probability of detecting the specified sensitivity can be lower than 50%. On the other hand, the needed sample size calculated by the proposed permutation method will ensure detecting at least the desired sensitivity with 95% probability. The method is shown to perform well for a real example dataset using a small pilot dataset with 4-6 samples per group.

Conclusions: We recommend that the sample size problem should be formulated to detect a specified proportion of differentially expressed genes with 95% probability. This formulation ensures finding the desired proportion of true positives with high probability. The proposed permutation method takes the correlation structure and effect size heterogeneity into consideration and works well using only a small pilot dataset.

Citing Articles

Dupilumab Therapy Modulates Circulating Inflammatory Mediators in Patients with Prurigo Nodularis.

Bao A, Ma E, Cornman H, Kambala A, Manjunath J, Kollhoff A JID Innov. 2024; 4(4):100281.

PMID: 38947360 PMC: 11214504. DOI: 10.1016/j.xjidi.2024.100281.

Transcriptomic characterization of Trichoderma harzianum T34 primed tomato plants: assessment of biocontrol agent induced host specific gene expression and plant growth promotion.

Aamir M, Shanmugam V, Dubey M, Husain F, Adil M, Ansari W BMC Plant Biol. 2023; 23(1):552.

PMID: 37940862 PMC: 10631224. DOI: 10.1186/s12870-023-04502-6.

The systematic comparison between Gaussian mirror and Model-X knockoff models.

Chen S, Li Z, Liu L, Wen Y Sci Rep. 2023; 13(1):5478.

PMID: 37015993 PMC: 10073103. DOI: 10.1038/s41598-023-32605-5.

Evaluation of a decided sample size in machine learning applications.

Rajput D, Wang W, Chen C BMC Bioinformatics. 2023; 24(1):48.

PMID: 36788550 PMC: 9926644. DOI: 10.1186/s12859-023-05156-9.

Determination of miRNA expression profile in patients with prostate cancer and benign prostate hyperplasia.

Sancer O, Kosar P, Tefebasi M, Ergun O, Demir M, Kosar A Turk J Med Sci. 2022; 52(3):788-795.

PMID: 36326314 PMC: 10390105. DOI: 10.55730/1300-0144.5374.

References

Pounds S, Cheng C . Sample size determination for the false discovery rate. Bioinformatics. 2005; 21(23):4263-71. DOI: 10.1093/bioinformatics/bti699. View

Ting Lee M, Whitmore G . Power and sample size for DNA microarray studies. Stat Med. 2002; 21(23):3543-70. DOI: 10.1002/sim.1335. View

Jung S, Bang H, Young S . Sample size calculation for multiple testing in microarray data analysis. Biostatistics. 2004; 6(1):157-69. DOI: 10.1093/biostatistics/kxh026. View

Shao Y, Tseng C . Sample size calculation with dependence adjustment for FDR-control in microarray studies. Stat Med. 2007; 26(23):4219-37. DOI: 10.1002/sim.2862. View

Li S, Bigler J, Lampe J, Potter J, Feng Z . FDR-controlling testing procedures and sample size determination for microarrays. Stat Med. 2005; 24(15):2267-80. DOI: 10.1002/sim.2119. View

Yang M, Yang J, McIndoe R, She J . Microarray experimental design: power and sample size considerations. Physiol Genomics. 2003; 16(1):24-8. DOI: 10.1152/physiolgenomics.00037.2003. View

Wang S, Chen J . Sample size for identifying differentially expressed genes in microarray experiments. J Comput Biol. 2004; 11(4):714-26. DOI: 10.1089/cmb.2004.11.714. View

Tsai C, Wang S, Chen D, Chen J . Sample size for gene expression microarray experiments. Bioinformatics. 2004; 21(8):1502-8. DOI: 10.1093/bioinformatics/bti162. View

Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D . Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci U S A. 1999; 96(12):6745-50. PMC: 21986. DOI: 10.1073/pnas.96.12.6745. View

10.

Jung S . Sample size for FDR-control in microarray data analysis. Bioinformatics. 2005; 21(14):3097-104. DOI: 10.1093/bioinformatics/bti456. View

11.

Dobbin K, Simon R . Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics. 2004; 6(1):27-38. DOI: 10.1093/biostatistics/kxh015. View

12.

Tibshirani R . A simple method for assessing sample sizes in microarray experiments. BMC Bioinformatics. 2006; 7:106. PMC: 1450307. DOI: 10.1186/1471-2105-7-106. View