» Articles » PMID: 34383754

DeepG4: A Deep Learning Approach to Predict Cell-type Specific Active G-quadruplex Regions

Overview
Specialty Biology
Date 2021 Aug 12
PMID 34383754
Citations 14
Authors
Affiliations
Soon will be listed here.
Abstract

DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4.

Citing Articles

Genomic 8-oxoguanine modulates gene transcription independent of its repair by DNA glycosylases OGG1 and MUTYH.

Obermann T, Sakshaug T, Kanagaraj V, Abentung A, de Sousa M, Hagen L Redox Biol. 2024; 79():103461.

PMID: 39662289 PMC: 11697278. DOI: 10.1016/j.redox.2024.103461.


Machine learning-based prediction of DNA G-quadruplex folding topology with G4ShapePredictor.

Liew D, Lim Z, Yong E Sci Rep. 2024; 14(1):24238.

PMID: 39414858 PMC: 11484705. DOI: 10.1038/s41598-024-74826-2.


Prediction of DNA i-motifs via machine learning.

Yang B, Guneri D, Yu H, Wright E, Chen W, Waller Z Nucleic Acids Res. 2024; 52(5):2188-2197.

PMID: 38364855 PMC: 10954440. DOI: 10.1093/nar/gkae092.


Toward a Better Understanding of G4 Evolution in the 3 Living Kingdoms.

Vannutelli A, Ouangraoua A, Perreault J Evol Bioinform Online. 2023; 19:11769343231212075.

PMID: 38046653 PMC: 10693206. DOI: 10.1177/11769343231212075.


EndoQuad: a comprehensive genome-wide experimentally validated endogenous G-quadruplex database.

Qian S, Shi M, Xiong Y, Zhang Y, Zhang Z, Song X Nucleic Acids Res. 2023; 52(D1):D72-D80.

PMID: 37904589 PMC: 10767823. DOI: 10.1093/nar/gkad966.


References
1.
Klimentova E, Polacek J, Simecek P, Alexiou P . PENGUINN: Precise Exploration of Nuclear G-Quadruplexes Using Interpretable Neural Networks. Front Genet. 2020; 11:568546. PMC: 7653191. DOI: 10.3389/fgene.2020.568546. View

2.
Kouzine F, Wojtowicz D, Baranello L, Yamane A, Nelson S, Resch W . Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. Cell Syst. 2017; 4(3):344-356.e7. PMC: 7432990. DOI: 10.1016/j.cels.2017.01.013. View

3.
Cimino-Reale G, Zaffaroni N, Folini M . Emerging Role of G-quadruplex DNA as Target in Anticancer Therapy. Curr Pharm Des. 2016; 22(44):6612-6624. DOI: 10.2174/1381612822666160831101031. View

4.
Brooks T, Hurley L . Targeting MYC Expression through G-Quadruplexes. Genes Cancer. 2010; 1(6):641-649. PMC: 2992328. DOI: 10.1177/1947601910377493. View

5.
Huppert J, Balasubramanian S . G-quadruplexes in promoters throughout the human genome. Nucleic Acids Res. 2006; 35(2):406-13. PMC: 1802602. DOI: 10.1093/nar/gkl1057. View