» Articles » PMID: 30085218

Disentangling Transcription Factor Binding Site Complexity

Overview
Specialty Biochemistry
Date 2018 Aug 8
PMID 30085218
Citations 8
Authors
Affiliations
Soon will be listed here.
Abstract

The binding motifs of many transcription factors (TFs) comprise a higher degree of complexity than a single position weight matrix model permits. Additional complexity is typically taken into account either as intra-motif dependencies via more sophisticated probabilistic models or as heterogeneities via multiple weight matrices. However, both orthogonal approaches have limitations when learning from in vivo data where binding sites of other factors in close proximity can interfere with motif discovery for the protein of interest. In this work, we demonstrate how intra-motif complexity can, purely by analyzing the statistical properties of a given set of TF-binding sites, be distinguished from complexity arising from an intermix with motifs of co-binding TFs or other artifacts. In addition, we study the related question whether intra-motif complexity is represented more effectively by dependencies, heterogeneities or variants in between. Benchmarks demonstrate the effectiveness of both methods for their respective tasks and applications on motif discovery output from recent tools detect and correct many undesirable artifacts. These results further suggest that the prevalence of intra-motif dependencies may have been overestimated in previous studies on in vivo data and should thus be reassessed.

Citing Articles

Construction of the genetic switches in response to mannitol based on artificial MtlR box.

Xiao F, Zhang Y, Zhang L, Ding Z, Shi G, Li Y Bioresour Bioprocess. 2024; 10(1):9.

PMID: 38647829 PMC: 10992428. DOI: 10.1186/s40643-023-00634-7.


Harnessing regulatory networks in Actinobacteria for natural product discovery.

Augustijn H, Roseboom A, Medema M, van Wezel G J Ind Microbiol Biotechnol. 2024; 51.

PMID: 38569653 PMC: 10996143. DOI: 10.1093/jimb/kuae011.


Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites.

Lavezzo G, de Souza Lauretto M, Andrioli L, Machado-Lima A Genet Mol Biol. 2024; 46(4):e20230048.

PMID: 38285430 PMC: 10945726. DOI: 10.1590/1678-4685-GMB-2023-0048.


Motif models proposing independent and interdependent impacts of nucleotides are related to high and low affinity transcription factor binding sites in Arabidopsis.

Tsukanov A, Mironova V, Levitsky V Front Plant Sci. 2022; 13:938545.

PMID: 35968123 PMC: 9373801. DOI: 10.3389/fpls.2022.938545.


Bayesian Markov models improve the prediction of binding motifs beyond first order.

Ge W, Meier M, Roth C, Soding J NAR Genom Bioinform. 2021; 3(2):lqab026.

PMID: 33928244 PMC: 8057495. DOI: 10.1093/nargab/lqab026.


References
1.
Mathelier A, Fornes O, Arenillas D, Chen C, Denay G, Lee J . JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2015; 44(D1):D110-5. PMC: 4702842. DOI: 10.1093/nar/gkv1176. View

2.
Ben-Gal I, Shani A, Gohr A, Grau J, Arviv S, Shmilovici A . Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics. 2005; 21(11):2657-66. DOI: 10.1093/bioinformatics/bti410. View

3.
Mikula M, Gaj P, Dzwonek K, Rubel T, Karczmarski J, Paziewska A . Comprehensive analysis of the palindromic motif TCTCGCGAGA: a regulatory element of the HNRNPK promoter. DNA Res. 2010; 17(4):245-60. PMC: 2920758. DOI: 10.1093/dnares/dsq016. View

4.
Keilwagen J, Grau J . Varying levels of complexity in transcription factor binding motifs. Nucleic Acids Res. 2015; 43(18):e119. PMC: 4605289. DOI: 10.1093/nar/gkv577. View

5.
Eggeling R, Grosse I, Grau J . InMoDe: tools for learning and visualizing intra-motif dependencies of DNA binding sites. Bioinformatics. 2016; 33(4):580-582. PMC: 5408807. DOI: 10.1093/bioinformatics/btw689. View