IPro-WAEL: a Comprehensive and Robust Framework for Identifying Promoters in Multiple Species
Overview
Authors
Affiliations
Promoters are consensus DNA sequences located near the transcription start sites and they play an important role in transcription initiation. Due to their importance in biological processes, the identification of promoters is significantly important for characterizing the expression of the genes. Numerous computational methods have been proposed to predict promoters. However, it is difficult for these methods to achieve satisfactory performance in multiple species. In this study, we propose a novel weighted average ensemble learning model, termed iPro-WAEL, for identifying promoters in multiple species, including Human, Mouse, E.coli, Arabidopsis, B.amyloliquefaciens, B.subtilis and R.capsulatus. Extensive benchmarking experiments illustrate that iPro-WAEL has optimal performance and is superior to the current methods in promoter prediction. The experimental results also demonstrate a satisfactory prediction ability of iPro-WAEL on cross-cell lines, promoters annotated by other methods and distinguishing between promoters and enhancers. Moreover, we identify the most important transcription factor binding site (TFBS) motif in promoter regions to facilitate the study of identifying important motifs in the promoter regions. The source code of iPro-WAEL is freely available at https://github.com/HaoWuLab-Bioinformatics/iPro-WAEL.
Analyzing scRNA-seq data by CCP-assisted UMAP and tSNE.
Hozumi Y, Wei G PLoS One. 2024; 19(12):e0311791.
PMID: 39671349 PMC: 11642954. DOI: 10.1371/journal.pone.0311791.
Li F, Bin Y, Zhao J, Zheng C Interdiscip Sci. 2024; 17(1):200-214.
PMID: 39661307 DOI: 10.1007/s12539-024-00665-4.
Wang C, Zou Q PLoS Comput Biol. 2024; 20(11):e1012607.
PMID: 39556608 PMC: 11611262. DOI: 10.1371/journal.pcbi.1012607.
Shi P, Han J, Zhang Y, Li G, Zhou X PLoS Comput Biol. 2024; 20(8):e1012389.
PMID: 39186807 PMC: 11379397. DOI: 10.1371/journal.pcbi.1012389.
Benchmarking DNA Foundation Models for Genomic Sequence Classification.
Feng H, Wu L, Zhao B, Huff C, Zhang J, Wu J bioRxiv. 2024; .
PMID: 39185205 PMC: 11343214. DOI: 10.1101/2024.08.16.608288.