» Articles » PMID: 15693945

Preferred Analysis Methods for Affymetrix GeneChips Revealed by a Wholly Defined Control Dataset

Overview
Journal Genome Biol
Specialties Biology
Genetics
Date 2005 Feb 8
PMID 15693945
Citations 188
Authors
Affiliations
Soon will be listed here.
Abstract

Background: As more methods are developed to analyze RNA-profiling data, assessing their performance using control datasets becomes increasingly important.

Results: We present a 'spike-in' experiment for Affymetrix GeneChips that provides a defined dataset of 3,860 RNA species, which we use to evaluate analysis options for identifying differentially expressed genes. The experimental design incorporates two novel features. First, to obtain accurate estimates of false-positive and false-negative rates, 100-200 RNAs are spiked in at each fold-change level of interest, ranging from 1.2 to 4-fold. Second, instead of using an uncharacterized background RNA sample, a set of 2,551 RNA species is used as the constant (1x) set, allowing us to know whether any given probe set is truly present or absent. Application of a large number of analysis methods to this dataset reveals clear variation in their ability to identify differentially expressed genes. False-negative and false-positive rates are minimized when the following options are chosen: subtracting nonspecific signal from the PM probe intensities; performing an intensity-dependent normalization at the probe set level; and incorporating a signal intensity-dependent standard deviation in the test statistic.

Conclusions: A best-route combination of analysis methods is presented that allows detection of approximately 70% of true positives before reaching a 10% false-discovery rate. We highlight areas in need of improvement, including better estimate of false-discovery rates and decreased false-negative rates.

Citing Articles

A Transcriptomics-Based Machine Learning Model Discriminating Mild Cognitive Impairment and the Prediction of Conversion to Alzheimer's Disease.

Park M, Ahn J, Lim J, Han M, Lee J, Lee J Cells. 2024; 13(22).

PMID: 39594668 PMC: 11593234. DOI: 10.3390/cells13221920.


Revisiting Fold-Change Calculation: Preference for Median or Geometric Mean over Arithmetic Mean-Based Methods.

Lotsch J, Kringel D, Ultsch A Biomedicines. 2024; 12(8).

PMID: 39200104 PMC: 11352044. DOI: 10.3390/biomedicines12081639.


The history and conceptual framework of assays and screens.

Giacoletto C, Schiller M Bioessays. 2023; 45(4):e2200191.

PMID: 36789580 PMC: 10024921. DOI: 10.1002/bies.202200191.


Comprehensive expression analysis with cell-type-specific transcriptome in ALS-linked mutant SOD1 mice: Revisiting the active role of glial cells in disease.

Yamashita H, Komine O, Fujimori-Tonou N, Yamanaka K Front Cell Neurosci. 2023; 16:1045647.

PMID: 36687517 PMC: 9846815. DOI: 10.3389/fncel.2022.1045647.


High BRCA1 gene expression increases the risk of early distant metastasis in ER breast cancers.

Chang H, Yang U, Lai M, Chen C, Fann Y Sci Rep. 2022; 12(1):77.

PMID: 34996912 PMC: 8741892. DOI: 10.1038/s41598-021-03471-w.


References
1.
Rajagopalan D . A comparison of statistical methods for analysis of high density oligonucleotide array data. Bioinformatics. 2003; 19(12):1469-76. DOI: 10.1093/bioinformatics/btg202. View

2.
Irizarry R, Bolstad B, Collin F, Cope L, Hobbs B, Speed T . Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003; 31(4):e15. PMC: 150247. DOI: 10.1093/nar/gng015. View

3.
Barash Y, Dehan E, Krupsky M, Franklin W, Geraci M, Friedman N . Comparative analysis of algorithms for signal quantitation from oligonucleotide microarrays. Bioinformatics. 2004; 20(6):839-46. DOI: 10.1093/bioinformatics/btg487. View

4.
Yang Y, Dudoit S, Luu P, Lin D, Peng V, Ngai J . Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002; 30(4):e15. PMC: 100354. DOI: 10.1093/nar/30.4.e15. View

5.
He Y, Dai H, Schadt E, Cavet G, Edwards S, Stepaniants S . Microarray standard data set and figures of merit for comparing data processing methods and experiment designs. Bioinformatics. 2003; 19(8):956-65. DOI: 10.1093/bioinformatics/btg126. View