» Articles » PMID: 38517697

Comparative Analysis of Models in Predicting the Effects of SNPs on TF-DNA Binding Using Large-scale in Vitro and in Vivo Data

Overview
Journal Brief Bioinform
Specialty Biology
Date 2024 Mar 22
PMID 38517697
Authors
Affiliations
Soon will be listed here.
Abstract

Non-coding variants associated with complex traits can alter the motifs of transcription factor (TF)-deoxyribonucleic acid binding. Although many computational models have been developed to predict the effects of non-coding variants on TF binding, their predictive power lacks systematic evaluation. Here we have evaluated 14 different models built on position weight matrices (PWMs), support vector machines, ordinary least squares and deep neural networks (DNNs), using large-scale in vitro (i.e. SNP-SELEX) and in vivo (i.e. allele-specific binding, ASB) TF binding data. Our results show that the accuracy of each model in predicting SNP effects in vitro significantly exceeds that achieved in vivo. For in vitro variant impact prediction, kmer/gkm-based machine learning methods (deltaSVM_HT-SELEX, QBiC-Pred) trained on in vitro datasets exhibit the best performance. For in vivo ASB variant prediction, DNN-based multitask models (DeepSEA, Sei, Enformer) trained on the ChIP-seq dataset exhibit relatively superior performance. Among the PWM-based methods, tRap demonstrates better performance in both in vitro and in vivo evaluations. In addition, we find that TF classes such as basic leucine zipper factors could be predicted more accurately, whereas those such as C2H2 zinc finger factors are predicted less accurately, aligning with the evolutionary conservation of these TF classes. We also underscore the significance of non-sequence factors such as cis-regulatory element type, TF expression, interactions and post-translational modifications in influencing the in vivo predictive performance of TFs. Our research provides valuable insights into selecting prioritization methods for non-coding variants and further optimizing such models.

Citing Articles

Noncoding variants and sulcal patterns in congenital heart disease: Machine learning to predict functional impact.

Mondragon-Estrada E, Newburger J, DePalma S, Brueckner M, Cleveland J, Chung W iScience. 2025; 28(2):111707.

PMID: 39877905 PMC: 11772982. DOI: 10.1016/j.isci.2024.111707.


: extended capability and database integration.

Coetzee S, Hazelett D ArXiv. 2024; .

PMID: 39010878 PMC: 11247919.

References
1.
Berger M, Philippakis A, Qureshi A, He F, Estep 3rd P, Bulyk M . Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006; 24(11):1429-35. PMC: 4419707. DOI: 10.1038/nbt1246. View

2.
Avsec Z, Agarwal V, Visentin D, Ledsam J, Grabska-Barwinska A, Taylor K . Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021; 18(10):1196-1203. PMC: 8490152. DOI: 10.1038/s41592-021-01252-x. View

3.
Hornbeck P, Zhang B, Murray B, Kornhauser J, Latham V, Skrzypek E . PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 2014; 43(Database issue):D512-20. PMC: 4383998. DOI: 10.1093/nar/gku1267. View

4.
Barroso I, McCarthy M . The Genetic Basis of Metabolic Disease. Cell. 2019; 177(1):146-161. PMC: 6432945. DOI: 10.1016/j.cell.2019.02.024. View

5.
Abramov S, Boytsov A, Bykova D, Penzar D, Yevshin I, Kolmykov S . Landscape of allele-specific transcription factor binding in the human genome. Nat Commun. 2021; 12(1):2751. PMC: 8115691. DOI: 10.1038/s41467-021-23007-0. View