» Articles » PMID: 17767709

GeneSrF and VarSelRF: a Web-based Tool and R Package for Gene Selection and Classification Using Random Forest

Overview
Publisher Biomed Central
Specialty Biology
Date 2007 Sep 5
PMID 17767709
Citations 81
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Microarray data are often used for patient classification and gene selection. An appropriate tool for end users and biomedical researchers should combine user friendliness with statistical rigor, including carefully avoiding selection biases and allowing analysis of multiple solutions, together with access to additional functional information of selected genes. Methodologically, such a tool would be of greater use if it incorporates state-of-the-art computational approaches and makes source code available.

Results: We have developed GeneSrF, a web-based tool, and varSelRF, an R package, that implement, in the context of patient classification, a validated method for selecting very small sets of genes while preserving classification accuracy. Computation is parallelized, allowing to take advantage of multicore CPUs and clusters of workstations. Output includes bootstrapped estimates of prediction error rate, and assessments of the stability of the solutions. Clickable tables link to additional information for each gene (GO terms, PubMed citations, KEGG pathways), and output can be sent to PaLS for examination of PubMed references, GO terms, KEGG and and Reactome pathways characteristic of sets of genes selected for class prediction. The full source code is available, allowing to extend the software. The web-based application is available from http://genesrf2.bioinfo.cnio.es. All source code is available from Bioinformatics.org or The Launchpad. The R package is also available from CRAN.

Conclusion: varSelRF and GeneSrF implement a validated method for gene selection including bootstrap estimates of classification error rate. They are valuable tools for applied biomedical researchers, specially for exploratory work with microarray data. Because of the underlying technology used (combination of parallelization with web-based application) they are also of methodological interest to bioinformaticians and biostatisticians.

Citing Articles

Evaluating Ovarian Cancer Chemotherapy Response Using Gene Expression Data and Machine Learning.

Amniouel S, Yalamanchili K, Sankararaman S, Jafri M BioMedInformatics. 2024; 4(2):1396-1424.

PMID: 39149564 PMC: 11326537. DOI: 10.3390/biomedinformatics4020077.


Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis.

Mahawan T, Luckett T, Mielgo Iza A, Pornputtapong N, Caamano Gutierrez E BMC Med Inform Decis Mak. 2024; 24(Suppl 4):175.

PMID: 38902676 PMC: 11191155. DOI: 10.1186/s12911-024-02578-0.


Development of a multivariate prediction model for antidepressant resistant depression using reward-related predictors.

Liu X, Read S Front Psychiatry. 2024; 15:1349576.

PMID: 38590792 PMC: 10999634. DOI: 10.3389/fpsyt.2024.1349576.


High hypoxia status in pancreatic cancer is associated with multiple hallmarks of an immunosuppressive tumor microenvironment.

Sadozai H, Acharjee A, Kayani H, Gruber T, Gorczynski R, Burke B Front Immunol. 2024; 15:1360629.

PMID: 38510243 PMC: 10951397. DOI: 10.3389/fimmu.2024.1360629.


High-accuracy prediction of colorectal cancer chemotherapy efficacy using machine learning applied to gene expression data.

Amniouel S, Jafri M Front Physiol. 2024; 14:1272206.

PMID: 38304289 PMC: 10830836. DOI: 10.3389/fphys.2023.1272206.


References
1.
Ambroise C, McLachlan G . Selection bias in gene extraction on the basis of microarray gene-expression data. Proc Natl Acad Sci U S A. 2002; 99(10):6562-6. PMC: 124442. DOI: 10.1073/pnas.102102699. View

2.
Diaz-Uriarte R, Alibes A, Morrissey E, Canada A, Rueda O, Neves M . Asterias: integrated analysis of expression and aCGH data using an open-source, web-based, parallelized software suite. Nucleic Acids Res. 2007; 35(Web Server issue):W75-80. PMC: 1933128. DOI: 10.1093/nar/gkm229. View

3.
Dudoit S, Gentleman R, Quackenbush J . Open source software for the analysis of microarray data. Biotechniques. 2003; Suppl:45-51. View

4.
Somorjai R, Dolenko B, Baumgartner R . Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics. 2003; 19(12):1484-91. DOI: 10.1093/bioinformatics/btg182. View

5.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E . Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics. 2004; 21(2):171-8. DOI: 10.1093/bioinformatics/bth469. View