» Articles » PMID: 25389269

Binding Site Discovery from Nucleic Acid Sequences by Discriminative Learning of Hidden Markov Models

Overview
Specialty Biochemistry
Date 2014 Nov 13
PMID 25389269
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.

Citing Articles

Harnessing regulatory networks in Actinobacteria for natural product discovery.

Augustijn H, Roseboom A, Medema M, van Wezel G J Ind Microbiol Biotechnol. 2024; 51.

PMID: 38569653 PMC: 10996143. DOI: 10.1093/jimb/kuae011.


A survey on algorithms to characterize transcription factor binding sites.

Tognon M, Giugno R, Pinello L Brief Bioinform. 2023; 24(3).

PMID: 37099664 PMC: 10422928. DOI: 10.1093/bib/bbad156.


RBM10: Structure, functions, and associated diseases.

Inoue A Gene. 2021; 783:145463.

PMID: 33515724 PMC: 10445532. DOI: 10.1016/j.gene.2021.145463.


RNA binding motif protein 10 suppresses lung cancer progression by controlling alternative splicing of eukaryotic translation initiation factor 4H.

Zhang S, Bao Y, Shen X, Pan Y, Sun Y, Xiao M EBioMedicine. 2020; 61:103067.

PMID: 33130397 PMC: 7585942. DOI: 10.1016/j.ebiom.2020.103067.


Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation.

Hollbacher B, Balazs K, Heinig M, Uhlenhaut N Comput Struct Biotechnol J. 2020; 18:1330-1341.

PMID: 32612756 PMC: 7306512. DOI: 10.1016/j.csbj.2020.05.018.


References
1.
Fairbrother W, Yeh R, Sharp P, Burge C . Predictive identification of exonic splicing enhancers in human genes. Science. 2002; 297(5583):1007-13. DOI: 10.1126/science.1073774. View

2.
Benos P, Bulyk M, Stormo G . Additivity in protein-DNA interactions: how good an approximation is it?. Nucleic Acids Res. 2002; 30(20):4442-51. PMC: 137142. DOI: 10.1093/nar/gkf578. View

3.
Sinha S, Tompa M . Discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2002; 30(24):5549-60. PMC: 140044. DOI: 10.1093/nar/gkf669. View

4.
Sinha S, Tompa M . YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003; 31(13):3586-8. PMC: 169024. DOI: 10.1093/nar/gkg618. View

5.
Ying Q, Nichols J, Chambers I, Smith A . BMP induction of Id proteins suppresses differentiation and sustains embryonic stem cell self-renewal in collaboration with STAT3. Cell. 2003; 115(3):281-92. DOI: 10.1016/s0092-8674(03)00847-x. View