» Articles » PMID: 33264415

Integrating Genome Sequence and Structural Data for Statistical Learning to Predict Transcription Factor Binding Sites

Overview
Specialty Biochemistry
Date 2020 Dec 2
PMID 33264415
Citations 6
Authors
Affiliations
Soon will be listed here.
Abstract

We report an approach to predict DNA specificity of the tetracycline repressor (TetR) family transcription regulators (TFRs). First, a genome sequence-based method was streamlined with quantitative P-values defined to filter out reliable predictions. Then, a framework was introduced to incorporate structural data and to train a statistical energy function to score the pairing between TFR and TFR binding site (TFBS) based on sequences. The predictions benchmarked against experiments, TFBSs for 29 out of 30 TFRs were correctly predicted by either the genome sequence-based or the statistical energy-based method. Using P-values or Z-scores as indicators, we estimate that 59.6% of TFRs are covered with relatively reliable predictions by at least one of the two methods, while only 28.7% are covered by the genome sequence-based method alone. Our approach predicts a large number of new TFBs which cannot be correctly retrieved from public databases such as FootprintDB. High-throughput experimental assays suggest that the statistical energy can model the TFBSs of a significant number of TFRs reliably. Thus the energy function may be applied to explore for new TFBSs in respective genomes. It is possible to extend our approach to other transcriptional factor families with sufficient structural information.

Citing Articles

Decoding and reengineering the promoter specificity of T7-like RNA polymerases based on phage genome sequences.

Zhu J, Liu Z, Lou C, Chen Q, Liu H Nucleic Acids Res. 2025; 53(5).

PMID: 40042813 PMC: 11880802. DOI: 10.1093/nar/gkaf140.


Identification of differentially expressed MiRNA clusters in cervical cancer.

Sriharikrishnaa S, Jishnu P, Varghese V, Shukla V, Mallya S, Chakrabarty S Discov Oncol. 2025; 16(1):172.

PMID: 39946028 PMC: 11825440. DOI: 10.1007/s12672-025-01946-0.


Systematic investigation of TetR-family transcriptional regulators and their roles on lignocellulosic inhibitor acetate tolerance in .

Xiao Y, Qin T, He S, Chen Y, Li H, He Q Front Bioeng Biotechnol. 2024; 12:1385519.

PMID: 38585710 PMC: 10998469. DOI: 10.3389/fbioe.2024.1385519.


Immune-related long noncoding RNA zinc finger protein 710-AS1-201 promotes the metastasis and invasion of gastric cancer cells.

Ding W, Chen W, Wang Y, Xu X, Wang Y, Yan Y World J Gastrointest Oncol. 2024; 16(2):458-474.

PMID: 38425400 PMC: 10900153. DOI: 10.4251/wjgo.v16.i2.458.


Snowprint: a predictive tool for genetic biosensor discovery.

dOelsnitz S, Stofel S, Love J, Ellington A Commun Biol. 2024; 7(1):163.

PMID: 38336860 PMC: 10858194. DOI: 10.1038/s42003-024-05849-8.


References
1.
Cuthbertson L, Nodwell J . The TetR family of regulators. Microbiol Mol Biol Rev. 2013; 77(3):440-75. PMC: 3811609. DOI: 10.1128/MMBR.00018-13. View

2.
Yan B, Methe B, Lovley D, Krushkal J . Computational prediction of conserved operons and phylogenetic footprinting of transcription regulatory elements in the metal-reducing bacterial family Geobacteraceae. J Theor Biol. 2004; 230(1):133-44. DOI: 10.1016/j.jtbi.2004.04.022. View

3.
Sebastian A, Contreras-Moreira B . footprintDB: a database of transcription factors with annotated cis elements and binding interfaces. Bioinformatics. 2013; 30(2):258-65. DOI: 10.1093/bioinformatics/btt663. View

4.
Maity T, Close D, Valdez Y, Nowak-Lovato K, Marti-Arbona R, Nguyen T . Discovery of DNA operators for TetR and MarR family transcription factors from Burkholderia xenovorans. Microbiology (Reading). 2011; 158(Pt 2):571-582. DOI: 10.1099/mic.0.055129-0. View

5.
Vaquerizas J, Teichmann S, Luscombe N . How do you find transcription factors? Computational approaches to compile and annotate repertoires of regulators for any genome. Methods Mol Biol. 2011; 786:3-19. DOI: 10.1007/978-1-61779-292-2_1. View