Biomarker Discovery Using Statistically Significant Gene Sets

Overview

Journal J Comput Biol

Specialties Biology
Molecular Biology

Date 2011 Apr 5

PMID 21457009

Citations 2

Authors

Hoon Kim

John Watkinson

Dimitris Anastassiou

Affiliations

Soon will be listed here.

Abstract

Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis.

Citing Articles

Feature selection with interactions in logistic regression models using multivariate synergies for a GWAS application.

Xu E, Qian X, Yu Q, Zhang H, Cui S BMC Genomics. 2018; 19(Suppl 4):170.

PMID: 29589561 PMC: 5872388. DOI: 10.1186/s12864-018-4552-x.

Detecting Pairwise Interactive Effects of Continuous Random Variables for Biomarker Identification with Small Sample Size.

Adl A, Lee H, Qian X IEEE/ACM Trans Comput Biol Bioinform. 2016; 14(6):1265-1275.

PMID: 27362985 PMC: 5775817. DOI: 10.1109/TCBB.2016.2586042.

References

Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M . A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27):2817-26. DOI: 10.1056/NEJMoa041588. View

Irizarry R, Hobbs B, Collin F, Beazer-Barclay Y, Antonellis K, Scherf U . Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003; 4(2):249-64. DOI: 10.1093/biostatistics/4.2.249. View

Frank R, Hargreaves R . Clinical biomarkers in drug discovery and development. Nat Rev Drug Discov. 2003; 2(7):566-80. DOI: 10.1038/nrd1130. View

Ross M, Shurtleff S, Williams W, Patel D, Mahfouz R, Behm F . Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell. 2002; 1(2):133-43. DOI: 10.1016/s1535-6108(02)00032-6. View

Wang Y, Tetko I, Hall M, Frank E, Facius A, Mayer K . Gene selection from microarray data for cancer classification--a machine learning approach. Comput Biol Chem. 2005; 29(1):37-46. DOI: 10.1016/j.compbiolchem.2004.11.001. View

Fox R, Dimmic M . A two-sample Bayesian t-test for microarray data. BMC Bioinformatics. 2006; 7:126. PMC: 1431571. DOI: 10.1186/1471-2105-7-126. View

Jiang H, Deng Y, Chen H, Tao L, Sha Q, Chen J . Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes. BMC Bioinformatics. 2004; 5:81. PMC: 476733. DOI: 10.1186/1471-2105-5-81. View

Baldi P, Long A . A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes. Bioinformatics. 2001; 17(6):509-19. DOI: 10.1093/bioinformatics/17.6.509. View

Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J . Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999; 286(5439):531-7. DOI: 10.1126/science.286.5439.531. View

10.

Newton M, Kendziorski C, Richmond C, Blattner F, Tsui K . On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J Comput Biol. 2001; 8(1):37-52. DOI: 10.1089/106652701300099074. View

11.

Saeys Y, Inza I, Larranaga P . A review of feature selection techniques in bioinformatics. Bioinformatics. 2007; 23(19):2507-17. DOI: 10.1093/bioinformatics/btm344. View

12.

Jaeger J, Sengupta R, Ruzzo W . Improved gene selection for classification of microarrays. Pac Symp Biocomput. 2003; :53-64. DOI: 10.1142/9789812776303_0006. View

13.

Watkinson J, Wang X, Zheng T, Anastassiou D . Identification of gene interactions associated with disease from gene expression data using synergy networks. BMC Syst Biol. 2008; 2:10. PMC: 2258206. DOI: 10.1186/1752-0509-2-10. View

14.

Singh D, Febbo P, Ross K, Jackson D, Manola J, Ladd C . Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002; 1(2):203-9. DOI: 10.1016/s1535-6108(02)00030-2. View

15.

Saleem M, Kweon M, Johnson J, Adhami V, Elcheva I, Khan N . S100A4 accelerates tumorigenesis and invasion of human prostate cancer through the transcriptional regulation of matrix metalloproteinase 9. Proc Natl Acad Sci U S A. 2006; 103(40):14825-30. PMC: 1595436. DOI: 10.1073/pnas.0606747103. View

16.

Ma S, Huang J . Regularized ROC method for disease classification and biomarker selection with microarray data. Bioinformatics. 2005; 21(24):4356-62. DOI: 10.1093/bioinformatics/bti724. View

17.

Thomas J, Olson J, Tapscott S, Zhao L . An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 2001; 11(7):1227-36. PMC: 311075. DOI: 10.1101/gr.165101. View

18.

Xiong M, Fang X, Zhao J . Biomarker identification by feature wrappers. Genome Res. 2001; 11(11):1878-87. PMC: 311150. DOI: 10.1101/gr.190001. View

19.

J van t Veer L, Dai H, van de Vijver M, He Y, Hart A, Mao M . Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002; 415(6871):530-6. DOI: 10.1038/415530a. View

20.

Troyanskaya O, Garber M, Brown P, Botstein D, Altman R . Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics. 2002; 18(11):1454-61. DOI: 10.1093/bioinformatics/18.11.1454. View