» Articles » PMID: 25684150

The Evaluation of Tools Used to Predict the Impact of Missense Variants is Hindered by Two Types of Circularity

Overview
Journal Hum Mutat
Specialty Genetics
Date 2015 Feb 17
PMID 25684150
Citations 182
Authors
Affiliations
Soon will be listed here.
Abstract

Prioritizing missense variants for further experimental investigation is a key challenge in current sequencing studies for exploring complex and Mendelian diseases. A large number of in silico tools have been employed for the task of pathogenicity prediction, including PolyPhen-2, SIFT, FatHMM, MutationTaster-2, MutationAssessor, Combined Annotation Dependent Depletion, LRT, phyloP, and GERP++, as well as optimized methods of combining tool scores, such as Condel and Logit. Due to the wealth of these methods, an important practical question to answer is which of these tools generalize best, that is, correctly predict the pathogenic character of new variants. We here demonstrate in a study of 10 tools on five datasets that such a comparative evaluation of these tools is hindered by two types of circularity: they arise due to (1) the same variants or (2) different variants from the same protein occurring both in the datasets used for training and for evaluation of these tools, which may lead to overly optimistic results. We show that comparative evaluations of predictors that do not address these types of circularity may erroneously conclude that circularity confounded tools are most accurate among all tools, and may even outperform optimized combinations of tools.

Citing Articles

The specification game: rethinking the evaluation of drug response prediction for precision oncology.

Codice F, Pancotti C, Rollo C, Moreau Y, Fariselli P, Raimondi D J Cheminform. 2025; 17(1):33.

PMID: 40087708 DOI: 10.1186/s13321-025-00972-y.


AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations.

Pir M, Timucin E Protein Sci. 2025; 34(2):e70030.

PMID: 39840793 PMC: 11751861. DOI: 10.1002/pro.70030.


Meta-EA: a gene-specific combination of available computational tools for predicting missense variant effects.

Katsonis P, Lichtarge O Nat Commun. 2025; 16(1):159.

PMID: 39746940 PMC: 11696468. DOI: 10.1038/s41467-024-55066-4.


Estimating the proportion of beneficial mutations that are not adaptive in mammals.

Latrille T, Joseph J, Hartasanchez D, Salamin N PLoS Genet. 2024; 20(12):e1011536.

PMID: 39724093 PMC: 11709321. DOI: 10.1371/journal.pgen.1011536.


Development and validation of animal variant classification guidelines to objectively evaluate genetic variant pathogenicity in domestic animals.

Boeykens F, Abitbol M, Anderson H, Casselman I, de Citres C, Hayward J Front Vet Sci. 2024; 11:1497817.

PMID: 39703406 PMC: 11656590. DOI: 10.3389/fvets.2024.1497817.


References
1.
Gough J, Karplus K, Hughey R, Chothia C . Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001; 313(4):903-19. DOI: 10.1006/jmbi.2001.5080. View

2.
Ng P, Henikoff S . SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003; 31(13):3812-4. PMC: 168916. DOI: 10.1093/nar/gkg509. View

3.
Sonnhammer E, Eddy S, Durbin R . Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997; 28(3):405-20. DOI: 10.1002/(sici)1097-0134(199707)28:3<405::aid-prot10>3.0.co;2-l. View

4.
Hubbard T, Aken B, Ayling S, Ballester B, Beal K, Bragin E . Ensembl 2009. Nucleic Acids Res. 2008; 37(Database issue):D690-7. PMC: 2686571. DOI: 10.1093/nar/gkn828. View

5.
Hindorff L, Sethupathy P, Junkins H, Ramos E, Mehta J, Collins F . Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009; 106(23):9362-7. PMC: 2687147. DOI: 10.1073/pnas.0903103106. View