» Articles » PMID: 28379348

BEESEM: Estimation of Binding Energy Models Using HT-SELEX Data

Overview
Journal Bioinformatics
Specialty Biology
Date 2017 Apr 6
PMID 28379348
Citations 22
Authors
Affiliations
Soon will be listed here.
Abstract

Motivation: Characterizing the binding specificities of transcription factors (TFs) is crucial to the study of gene expression regulation. Recently developed high-throughput experimental methods, including protein binding microarrays (PBM) and high-throughput SELEX (HT-SELEX), have enabled rapid measurements of the specificities for hundreds of TFs. However, few studies have developed efficient algorithms for estimating binding motifs based on HT-SELEX data. Also the simple method of constructing a position weight matrix (PWM) by comparing the frequency of the preferred sequence with single-nucleotide variants has the risk of generating motifs with higher information content than the true binding specificity.

Results: We developed an algorithm called BEESEM that builds on a comprehensive biophysical model of protein-DNA interactions, which is trained using the expectation maximization method. BEESEM is capable of selecting the optimal motif length and calculating the confidence intervals of estimated parameters. By comparing BEESEM with the published motifs estimated using the same HT-SELEX data, we demonstrate that BEESEM provides significant improvements. We also evaluate several motif discovery algorithms on independent PBM and ChIP-seq data. BEESEM provides significantly better fits to in vitro data, but its performance is similar to some other methods on in vivo data under the criterion of the area under the receiver operating characteristic curve (AUROC). This highlights the limitations of the purely rank-based AUROC criterion. Using quantitative binding data to assess models, however, demonstrates that BEESEM improves on prior models.

Availability And Implementation: Freely available on the web at http://stormo.wustl.edu/resources.html .

Contact: stormo@wustl.edu.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Citing Articles

ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions.

Schroeder J, Wolfe M, Freddolino L bioRxiv. 2025; .

PMID: 39975017 PMC: 11838363. DOI: 10.1101/2025.01.28.635290.


Experimental approaches to investigate biophysical interactions between homeodomain transcription factors and DNA.

Mekkaoui F, Drewell R, Dresch J, Spratt D Biochim Biophys Acta Gene Regul Mech. 2024; 1868(1):195074.

PMID: 39644990 PMC: 11832328. DOI: 10.1016/j.bbagrm.2024.195074.


Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors.

Jolma A, Laverty K, Fathi A, Yang A, Yellan I, Vorontsov I bioRxiv. 2024; .

PMID: 39605729 PMC: 11601247. DOI: 10.1101/2024.11.11.622097.


Predicting the DNA binding specificity of mutated transcription factors using family-level biophysically interpretable machine learning.

Liu S, Gomez-Alcala P, Leemans C, Glassford W, Mann R, Bussemaker H bioRxiv. 2024; .

PMID: 38352411 PMC: 10862739. DOI: 10.1101/2024.01.24.577115.


Predicting the molecular functions of regulatory genetic variants associated with cancer.

Song J, Manjunath M Oncotarget. 2023; 14:775-777.

PMID: 37646780 PMC: 10467629. DOI: 10.18632/oncotarget.28451.


References
1.
Djordjevic M, Sengupta A, Shraiman B . A biophysical approach to transcription factor binding site discovery. Genome Res. 2003; 13(11):2381-90. PMC: 403756. DOI: 10.1101/gr.1271603. View

2.
Berger M, Philippakis A, Qureshi A, He F, Estep 3rd P, Bulyk M . Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol. 2006; 24(11):1429-35. PMC: 4419707. DOI: 10.1038/nbt1246. View

3.
Stormo G, Zhao Y . Determining the specificity of protein-DNA interactions. Nat Rev Genet. 2010; 11(11):751-60. DOI: 10.1038/nrg2845. View

4.
Gordan R, Murphy K, McCord R, Zhu C, Vedenko A, Bulyk M . Curated collection of yeast transcription factor DNA binding specificity data reveals novel structural and gene regulatory insights. Genome Biol. 2011; 12(12):R125. PMC: 3334620. DOI: 10.1186/gb-2011-12-12-r125. View

5.
Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M . Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Res. 2010; 20(6):861-73. PMC: 2877582. DOI: 10.1101/gr.100552.109. View