» Articles » PMID: 34607350

Assessing Deep Learning Methods in Cis-regulatory Motif Finding Based on Genomic Sequencing Data

Overview
Journal Brief Bioinform
Specialty Biology
Date 2021 Oct 4
PMID 34607350
Citations 12
Authors
Affiliations
Soon will be listed here.
Abstract

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method's outputs.

Citing Articles

The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes.

Wanniarachchi D, Viswakula S, Wickramasuriya A BMC Bioinformatics. 2024; 25(1):371.

PMID: 39623329 PMC: 11613939. DOI: 10.1186/s12859-024-05995-0.


Identifying transcription factors with cell-type specific DNA binding signatures.

Awdeh A, Turcotte M, Perkins T BMC Genomics. 2024; 25(1):957.

PMID: 39402535 PMC: 11472444. DOI: 10.1186/s12864-024-10859-1.


The role of structure in regulatory RNA elements.

Tants J, Schlundt A Biosci Rep. 2024; 44(10).

PMID: 39364891 PMC: 11499389. DOI: 10.1042/BSR20240139.


Deep learning with a small dataset predicts chromatin remodelling contribution to winter dormancy of apple axillary buds.

Saito T, Wang S, Ohkawa K, Ohara H, Kondo S Tree Physiol. 2024; 44(7).

PMID: 38905284 PMC: 11285188. DOI: 10.1093/treephys/tpae072.


MMGAT: a graph attention network framework for ATAC-seq motifs finding.

Wu X, Hou W, Zhao Z, Huang L, Sheng N, Yang Q BMC Bioinformatics. 2024; 25(1):158.

PMID: 38643066 PMC: 11031952. DOI: 10.1186/s12859-024-05774-x.


References
1.
Zhang Q, Shen Z, Huang D . Modeling in-vivo protein-DNA binding by combining multiple-instance learning with a hybrid deep neural network. Sci Rep. 2019; 9(1):8484. PMC: 6559991. DOI: 10.1038/s41598-019-44966-x. View

2.
Ouyang M, Wang H, Ma J, Lu W, Li J, Yao C . COP1, the negative regulator of ETV1, influences prognosis in triple-negative breast cancer. BMC Cancer. 2015; 15:132. PMC: 4381371. DOI: 10.1186/s12885-015-1151-y. View

3.
Wang M, Tai C, E W, Wei L . DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res. 2018; 46(11):e69. PMC: 6009584. DOI: 10.1093/nar/gky215. View

4.
Zheng J, Zhang X, Zhao X, Tong X, Hong X, Xie J . Deep-RBPPred: Predicting RNA binding proteins in the proteome scale based on deep learning. Sci Rep. 2018; 8(1):15264. PMC: 6189057. DOI: 10.1038/s41598-018-33654-x. View

5.
Quang D, Xie X . DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016; 44(11):e107. PMC: 4914104. DOI: 10.1093/nar/gkw226. View