» Articles » PMID: 21457009

Biomarker Discovery Using Statistically Significant Gene Sets

Overview
Journal J Comput Biol
Date 2011 Apr 5
PMID 21457009
Citations 2
Authors
Affiliations
Soon will be listed here.
Abstract

Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis.

Citing Articles

Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application.

Xu E, Qian X, Yu Q, Zhang H, Cui S BMC Genomics. 2018; 19(Suppl 4):170.

PMID: 29589561 PMC: 5872388. DOI: 10.1186/s12864-018-4552-x.


Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample Size.

Adl A, Lee H, Qian X IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(6):1265-1275.

PMID: 27362985 PMC: 5775817. DOI: 10.1109/TCBB.2016.2586042.

References
1.
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M . A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27):2817-26. DOI: 10.1056/NEJMoa041588. View

2.
Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U . Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003; 4(2):249-64. DOI: 10.1093/biostatistics/4.2.249. View

3.
Frank R, Hargreaves R . Clinical biomarkers in drug discovery and development. Nat Rev Drug Discov. 2003; 2(7):566-80. DOI: 10.1038/nrd1130. View

4.
Ross M, Shurtleff S, Williams W, Patel D, Mahfouz R, Behm F . Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002; 1(2):133-43. DOI: 10.1016/s1535-6108(02)00032-6. View

5.
Wang Y, Tetko I, Hall M, Frank E, Facius A, Mayer K . Gene selection from microarray data for cancer classification--a machine learning approach. Comput Biol Chem. 2005; 29(1):37-46. DOI: 10.1016/j.compbiolchem.2004.11.001. View