» Articles » PMID: 26016777

SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps

Overview
Specialty Biology
Date 2015 May 29
PMID 26016777
Citations 38
Authors
Affiliations
Soon will be listed here.
Abstract

Genome-wide maps of transcription factor (TF) occupancy and regions of open chromatin implicitly contain DNA sequence signals for multiple factors. We present SeqGL, a novel de novo motif discovery algorithm to identify multiple TF sequence signals from ChIP-, DNase-, and ATAC-seq profiles. SeqGL trains a discriminative model using a k-mer feature representation together with group lasso regularization to extract a collection of sequence signals that distinguish peak sequences from flanking regions. Benchmarked on over 100 ChIP-seq experiments, SeqGL outperformed traditional motif discovery tools in discriminative accuracy. Furthermore, SeqGL can be naturally used with multitask learning to identify genomic and cell-type context determinants of TF binding. SeqGL successfully scales to the large multiplicity of sequence signals in DNase- or ATAC-seq maps. In particular, SeqGL was able to identify a number of ChIP-seq validated sequence signals that were not found by traditional motif discovery algorithms. Thus compared to widely used motif discovery algorithms, SeqGL demonstrates both greater discriminative accuracy and higher sensitivity for detecting the DNA sequence signals underlying regulatory element maps. SeqGL is available at http://cbio.mskcc.org/public/Leslie/SeqGL/.

Citing Articles

ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions.

Schroeder J, Wolfe M, Freddolino L bioRxiv. 2025; .

PMID: 39975017 PMC: 11838363. DOI: 10.1101/2025.01.28.635290.


Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.

Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W Int J Mol Sci. 2023; 24(21).

PMID: 37958843 PMC: 10649223. DOI: 10.3390/ijms242115858.


maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks.

Cazares T, Rizvi F, Iyer B, Chen X, Kotliar M, Bejjani A PLoS Comput Biol. 2023; 19(1):e1010863.

PMID: 36719906 PMC: 9917285. DOI: 10.1371/journal.pcbi.1010863.


BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin.

Kshirsagar M, Yuan H, Ferres J, Leslie C Genome Biol. 2022; 23(1):174.

PMID: 35971180 PMC: 9380350. DOI: 10.1186/s13059-022-02723-w.


Dynamic regulatory module networks for inference of cell type-specific transcriptional networks.

Fotuhi Siahpirani A, Knaack S, Chasman D, Seirup M, Sridharan R, Stewart R Genome Res. 2022; 32(7):1367-1384.

PMID: 35705328 PMC: 9341506. DOI: 10.1101/gr.276542.121.


References
1.
van Helden J, Andre B, Collado-Vides J . Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998; 281(5):827-42. DOI: 10.1006/jmbi.1998.1947. View

2.
Georgiev S, Boyle A, Jayasurya K, Ding X, Mukherjee S, Ohler U . Evidence-ranked motif identification. Genome Biol. 2010; 11(2):R19. PMC: 2872879. DOI: 10.1186/gb-2010-11-2-r19. View

3.
Newburger D, Bulyk M . UniPROBE: an online database of protein binding microarray data on protein-DNA interactions. Nucleic Acids Res. 2008; 37(Database issue):D77-82. PMC: 2686578. DOI: 10.1093/nar/gkn660. View

4.
Neph S, Vierstra J, Stergachis A, Reynolds A, Haugen E, Vernot B . An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012; 489(7414):83-90. PMC: 3736582. DOI: 10.1038/nature11212. View

5.
Friedman J, Hastie T, Tibshirani R . Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw. 2010; 33(1):1-22. PMC: 2929880. View