» Articles » PMID: 17241466

Statistical Significance of Cis-regulatory Modules

Overview
Publisher Biomed Central
Specialty Biology
Date 2007 Jan 24
PMID 17241466
Citations 52
Authors
Affiliations
Soon will be listed here.
Abstract

Background: It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning.

Results: We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites.

Conclusion: The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software.

Citing Articles

Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery.

Jyoti , Ritu , Gupta S, Shankar R Heliyon. 2024; 10(20):e39140.

PMID: 39640721 PMC: 11620080. DOI: 10.1016/j.heliyon.2024.e39140.


SETDB1 regulates short interspersed nuclear elements and chromatin loop organization in mouse neural precursor cells.

Sun D, Zhu Y, Peng W, Zheng S, Weng J, Dong S Genome Biol. 2024; 25(1):175.

PMID: 38961490 PMC: 11221086. DOI: 10.1186/s13059-024-03327-2.


Chromosomal-level reference genome of a wild North American mallard (Anas platyrhynchos).

Lavretsky P, Hernandez F, Swale T, Mohl J G3 (Bethesda). 2023; 13(10).

PMID: 37523777 PMC: 10542157. DOI: 10.1093/g3journal/jkad171.


Prediction of CTCF loop anchor based on machine learning.

Zhang X, Zhu W, Sun H, Ding Y, Liu L Front Genet. 2023; 14:1181956.

PMID: 37077544 PMC: 10106609. DOI: 10.3389/fgene.2023.1181956.


Interplay between the Chd4/NuRD Complex and the Transcription Factor Znf219 Controls Cardiac Cell Identity.

El Abdellaoui-Soussi F, Yunes-Leites P, Lopez-Maderuelo D, Garcia-Marques F, Vazquez J, Redondo J Int J Mol Sci. 2022; 23(17).

PMID: 36076959 PMC: 9455175. DOI: 10.3390/ijms23179565.


References
1.
Gupta M, Liu J . De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Natl Acad Sci U S A. 2005; 102(20):7079-84. PMC: 1129096. DOI: 10.1073/pnas.0408743102. View

2.
Claverie J, Audic S . The statistical significance of nucleotide position-weight matrix matches. Comput Appl Biosci. 1996; 12(5):431-9. DOI: 10.1093/bioinformatics/12.5.431. View

3.
Claverie J . Some useful statistical properties of position-weight matrices. Comput Chem. 1994; 18(3):287-94. DOI: 10.1016/0097-8485(94)85024-0. View

4.
Sinha S, Schroeder M, Unnerstall U, Gaul U, Siggia E . Cross-species comparison significantly improves genome-wide prediction of cis-regulatory modules in Drosophila. BMC Bioinformatics. 2004; 5:129. PMC: 521067. DOI: 10.1186/1471-2105-5-129. View

5.
Sosinsky A, Bonin C, Mann R, Honig B . Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 2003; 31(13):3589-92. PMC: 168951. DOI: 10.1093/nar/gkg544. View