» Articles » PMID: 38285430

Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which Model to Use? A Decision Rule Inferred for the Prediction of Transcription Factor Binding Sites

Overview
Journal Genet Mol Biol
Specialty Genetics
Date 2024 Jan 29
PMID 38285430
Authors
Affiliations
Soon will be listed here.
Abstract

Prediction of transcription factor binding sites (TFBS) is an example of application of Bioinformatics where DNA molecules are represented as sequences of A, C, G and T symbols. The most used model in this problem is Position Weight Matrix (PWM). Notwithstanding the advantage of being simple, PWMs cannot capture dependency between nucleotide positions, which may affect prediction performance. Acyclic Probabilistic Finite Automata (APFA) is an alternative model able to accommodate position dependencies. However, APFA is a more complex model, which means more parameters have to be learned. In this paper, we propose an innovative method to identify when position dependencies influence preference for PWMs or APFAs. This implied using position dependency features extracted from 1106 sets of TFBS to infer a decision tree able to predict which is the best model - PWM or APFA - for a given set of TFBSs. According to our results, as few as three pinpointed features are able to choose the best model, providing a balance of performance (average precision) and model simplicity.

References
1.
Slattery M, Zhou T, Yang L, Dantas Machado A, Gordan R, Rohs R . Absence of a simple code: how transcription factors read the genome. Trends Biochem Sci. 2014; 39(9):381-99. PMC: 4149858. DOI: 10.1016/j.tibs.2014.07.002. View

2.
Nakato R, Sakata T . Methods for ChIP-seq analysis: A practical workflow and advanced applications. Methods. 2020; 187:44-53. DOI: 10.1016/j.ymeth.2020.03.005. View

3.
Furlong E, Levine M . Developmental enhancers and chromosome topology. Science. 2018; 361(6409):1341-1345. PMC: 6986801. DOI: 10.1126/science.aau0320. View

4.
Spitz F, Furlong E . Transcription factors: from enhancer binding to developmental control. Nat Rev Genet. 2012; 13(9):613-26. DOI: 10.1038/nrg3207. View

5.
Badis G, Berger M, Philippakis A, Talukder S, Gehrke A, Jaeger S . Diversity and complexity in DNA recognition by transcription factors. Science. 2009; 324(5935):1720-3. PMC: 2905877. DOI: 10.1126/science.1162327. View