» Articles » PMID: 29538399

Generalising Better: Applying Deep Learning to Integrate Deleteriousness Prediction Scores for Whole-exome SNV Studies

Overview
Journal PLoS One
Date 2018 Mar 15
PMID 29538399
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

Many automatic classifiers were introduced to aid inference of phenotypical effects of uncategorised nsSNVs (nonsynonymous Single Nucleotide Variations) in theoretical and medical applications. Lately, several meta-estimators have been proposed that combine different predictors, such as PolyPhen and SIFT, to integrate more information in a single score. Although many advances have been made in feature design and machine learning algorithms used, the shortage of high-quality reference data along with the bias towards intensively studied in vitro models call for improved generalisation ability in order to further increase classification accuracy and handle records with insufficient data. Since a meta-estimator basically combines different scoring systems with highly complicated nonlinear relationships, we investigated how deep learning (supervised and unsupervised), which is particularly efficient at discovering hierarchies of features, can improve classification performance. While it is believed that one should only use deep learning for high-dimensional input spaces and other models (logistic regression, support vector machines, Bayesian classifiers, etc) for simpler inputs, we still believe that the ability of neural networks to discover intricate structure in highly heterogenous datasets can aid a meta-estimator. We compare the performance with various popular predictors, many of which are recommended by the American College of Medical Genetics and Genomics (ACMG), as well as available deep learning-based predictors. Thanks to hardware acceleration we were able to use a computationally expensive genetic algorithm to stochastically optimise hyper-parameters over many generations. Overfitting was hindered by noise injection and dropout, limiting coadaptation of hidden units. Although we stress that this work was not conceived as a tool comparison, but rather an exploration of the possibilities of deep learning application in ensemble scores, our results show that even relatively simple modern neural networks can significantly improve both prediction accuracy and coverage. We provide open-access to our finest model via the web-site: http://score.generesearch.ru/services/badmut/.

Citing Articles

Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges.

Barbitoff Y, Ushakov M, Lazareva T, Nasykhova Y, Glotov A, Predeus A Brief Bioinform. 2024; 25(2).

PMID: 38271481 PMC: 10810331. DOI: 10.1093/bib/bbad508.


Case Report: Phenotype-Driven Diagnosis of Atypical Dravet-Like Syndrome Caused by a Novel Splicing Variant in the Gene.

Sharkov A, Sparber P, Stepanova A, Pyankov D, Korostelev S, Skoblov M Front Genet. 2022; 13:888481.

PMID: 35711923 PMC: 9194094. DOI: 10.3389/fgene.2022.888481.


Case Report: Functional Investigation of an Undescribed Missense Variant Affecting Splicing in a Patient With Dravet Syndrome.

Sparber P, Mikhaylova S, Galkina V, Itkis Y, Skoblov M Front Neurol. 2021; 12:761892.

PMID: 34938262 PMC: 8686832. DOI: 10.3389/fneur.2021.761892.


Exploring Neuronal Vulnerability to Head Trauma Using a Whole Exome Approach.

Ibrahim O, Sutherland H, Maksemous N, Smith R, Haupt L, Griffiths L J Neurotrauma. 2020; 37(17):1870-1879.

PMID: 32233732 PMC: 7462038. DOI: 10.1089/neu.2019.6962.


Variation benchmark datasets: update, criteria, quality and applications.

Sarkar A, Yang Y, Vihinen M Database (Oxford). 2020; 2020.

PMID: 32016318 PMC: 6997940. DOI: 10.1093/database/baz117.


References
1.
Ng P, Henikoff S . SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31(13):3812-4. PMC: 168916. DOI: 10.1093/nar/gkg509. View

2.
Chun S, Fay J . Identification of deleterious mutations within three human genomes. Genome Res. 2009; 19(9):1553-61. PMC: 2752137. DOI: 10.1101/gr.092619.109. View

3.
Liu X, Jian X, Boerwinkle E . dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat. 2011; 32(8):894-9. PMC: 3145015. DOI: 10.1002/humu.21517. View

4.
Gonzalez-Perez A, Lopez-Bigas N . Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet. 2011; 88(4):440-9. PMC: 3071923. DOI: 10.1016/j.ajhg.2011.03.004. View

5.
. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2014; 43(Database issue):D1049-56. PMC: 4383973. DOI: 10.1093/nar/gku1179. View