» Articles » PMID: 19847297

Comparing Artificial Neural Networks, General Linear Models and Support Vector Machines in Building Predictive Models for Small Interfering RNAs

Overview
Journal PLoS One
Date 2009 Oct 23
PMID 19847297
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Background: Exogenous short interfering RNAs (siRNAs) induce a gene knockdown effect in cells by interacting with naturally occurring RNA processing machinery. However not all siRNAs induce this effect equally. Several heterogeneous kinds of machine learning techniques and feature sets have been applied to modeling siRNAs and their abilities to induce knockdown. There is some growing agreement to which techniques produce maximally predictive models and yet there is little consensus for methods to compare among predictive models. Also, there are few comparative studies that address what the effect of choosing learning technique, feature set or cross validation approach has on finding and discriminating among predictive models.

Principal Findings: Three learning techniques were used to develop predictive models for effective siRNA sequences including Artificial Neural Networks (ANNs), General Linear Models (GLMs) and Support Vector Machines (SVMs). Five feature mapping methods were also used to generate models of siRNA activities. The 2 factors of learning technique and feature mapping were evaluated by complete 3x5 factorial ANOVA. Overall, both learning techniques and feature mapping contributed significantly to the observed variance in predictive models, but to differing degrees for precision and accuracy as well as across different kinds and levels of model cross-validation.

Conclusions: The methods presented here provide a robust statistical framework to compare among models developed under distinct learning techniques and feature sets for siRNAs. Further comparisons among current or future modeling approaches should apply these or other suitable statistically equivalent methods to critically evaluate the performance of proposed models. ANN and GLM techniques tend to be more sensitive to the inclusion of noisy features, but the SVM technique is more robust under large numbers of features for measures of model precision and accuracy. Features found to result in maximally predictive models are not consistent across learning techniques, suggesting care should be taken in the interpretation of feature relevance. In the models developed here, there are statistically differentiable combinations of learning techniques and feature mapping methods where the SVM technique under a specific combination of features significantly outperforms all the best combinations of features within the ANN and GLM techniques.

Citing Articles

Predicting oil accumulation by fruit image processing and linear models in traditional and super high-density olive cultivars.

Montanaro G, Carlomagno A, Petrozza A, Cellini F, Manolikaki I, Koubouris G Front Plant Sci. 2024; 15:1456800.

PMID: 39600892 PMC: 11589486. DOI: 10.3389/fpls.2024.1456800.


Predicting epiglottic collapse in patients with obstructive sleep apnoea.

Azarbarzin A, Marques M, Sands S, Op de Beeck S, Genta P, Taranto-Montemurro L Eur Respir J. 2017; 50(3).

PMID: 28931660 PMC: 5915305. DOI: 10.1183/13993003.00345-2017.


ASPsiRNA: A Resource of ASP-siRNAs Having Therapeutic Potential for Human Genetic Disorders and Algorithm for Prediction of Their Inhibitory Efficacy.

Monga I, Qureshi A, Thakur N, Gupta A, Kumar M G3 (Bethesda). 2017; 7(9):2931-2943.

PMID: 28696921 PMC: 5592921. DOI: 10.1534/g3.117.044024.


Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data.

Isikhan S, Karabulut E, Alpar C Comput Math Methods Med. 2017; 2016:6794916.

PMID: 28096893 PMC: 5206477. DOI: 10.1155/2016/6794916.


Support Vector Machines Model of Computed Tomography for Assessing Lymph Node Metastasis in Esophageal Cancer with Neoadjuvant Chemotherapy.

Wang Z, Zhou Z, Chen Y, Li X, Sun Y J Comput Assist Tomogr. 2016; 41(3):455-460.

PMID: 27879527 PMC: 5457826. DOI: 10.1097/RCT.0000000000000555.


References
1.
Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H . Integrated siRNA design based on surveying of features associated with high RNAi effectiveness. BMC Bioinformatics. 2006; 7:516. PMC: 1698580. DOI: 10.1186/1471-2105-7-516. View

2.
Hsieh A, Bo R, Manola J, Vazquez F, Bare O, Khvorova A . A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens. Nucleic Acids Res. 2004; 32(3):893-901. PMC: 373385. DOI: 10.1093/nar/gkh238. View

3.
Vert J, Foveau N, Lajaunie C, Vandenbrouck Y . An accurate and interpretable model for siRNA efficacy prediction. BMC Bioinformatics. 2006; 7:520. PMC: 1698581. DOI: 10.1186/1471-2105-7-520. View

4.
Yiu S, Wong P, Lam T, Mui Y, Kung H, Lin M . Filtering of ineffective siRNAs and improved siRNA design tool. Bioinformatics. 2004; 21(2):144-51. DOI: 10.1093/bioinformatics/bth498. View

5.
Lu Z, Mathews D . Efficient siRNA selection using hybridization thermodynamics. Nucleic Acids Res. 2007; 36(2):640-7. PMC: 2241856. DOI: 10.1093/nar/gkm920. View