» Articles » PMID: 36161334

IPro-WAEL: a Comprehensive and Robust Framework for Identifying Promoters in Multiple Species

Overview
Specialty Biochemistry
Date 2022 Sep 26
PMID 36161334
Authors
Affiliations
Soon will be listed here.
Abstract

Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL.

Citing Articles

Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE.

Hozumi Y, Wei G PLoS One. 2024; 19(12):e0311791.

PMID: 39671349 PMC: 11642954. DOI: 10.1371/journal.pone.0311791.


DeepPD: A Deep Learning Method for Predicting Peptide Detectability Based on Multi-feature Representation and Information Bottleneck.

Li F, Bin Y, Zhao J, Zheng C Interdiscip Sci. 2024; 17(1):200-214.

PMID: 39661307 DOI: 10.1007/s12539-024-00665-4.


MFPSP: Identification of fungal species-specific phosphorylation site using offspring competition-based genetic algorithm.

Wang C, Zou Q PLoS Comput Biol. 2024; 20(11):e1012607.

PMID: 39556608 PMC: 11611262. DOI: 10.1371/journal.pcbi.1012607.


IMI-driver: Integrating multi-level gene networks and multi-omics for cancer driver gene identification.

Shi P, Han J, Zhang Y, Li G, Zhou X PLoS Comput Biol. 2024; 20(8):e1012389.

PMID: 39186807 PMC: 11379397. DOI: 10.1371/journal.pcbi.1012389.


Benchmarking DNA Foundation Models for Genomic Sequence Classification.

Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J bioRxiv. 2024; .

PMID: 39185205 PMC: 11343214. DOI: 10.1101/2024.08.16.608288.


References
1.
Li W, Godzik A . Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006; 22(13):1658-9. DOI: 10.1093/bioinformatics/btl158. View

2.
Ramprakash J, Schwarz F . Energetic contributions to the initiation of transcription in E. coli. Biophys Chem. 2008; 138(3):91-8. DOI: 10.1016/j.bpc.2008.09.007. View

3.
Chevez-Guardado R, Pena-Castillo L . Promotech: a general tool for bacterial promoter recognition. Genome Biol. 2021; 22(1):318. PMC: 8597233. DOI: 10.1186/s13059-021-02514-9. View

4.
Hoffman M, Buske O, Wang J, Weng Z, Bilmes J, Noble W . Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012; 9(5):473-6. PMC: 3340533. DOI: 10.1038/nmeth.1937. View

5.
Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y, Hume D . Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet. 2007; 8(6):424-36. DOI: 10.1038/nrg2026. View